110,032 research outputs found
Path Queries on Functions
Let f : [1..n] -> [1..n] be a function, and l : [1..n] -> [1..s] indicate a label assigned to each element of the domain. We design several compact data structures that answer various queries on the labels of paths in f. For example, we can find the minimum label in f^k (i) for a given i and any k >= 0 in a given range [k1..k2], using n lg n + O(n) bits, or the minimum label in f^(-k) (i) for a given i and k > 0, using 2n lg n + O(n) bits, both in time O(lg n/ lg lg n). By using n lg s + o(n lg s) further bits, we can also count, within the same time, the number of elements within a range of labels, and report each such element in O(1 + lg s / lg lg n) additional time. Several other possible queries are considered, such as top-t queries and t-majorities
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification
This paper investigates the problem of active learning for binary label
prediction on a graph. We introduce a simple and label-efficient algorithm
called S2 for this task. At each step, S2 selects the vertex to be labeled
based on the structure of the graph and all previously gathered labels.
Specifically, S2 queries for the label of the vertex that bisects the *shortest
shortest* path between any pair of oppositely labeled vertices. We present a
theoretical estimate of the number of queries S2 needs in terms of a novel
parametrization of the complexity of binary functions on graphs. We also
present experimental results demonstrating the performance of S2 on both real
and synthetic data. While other graph-based active learning algorithms have
shown promise in practice, our algorithm is the first with both good
performance and theoretical guarantees. Finally, we demonstrate the
implications of the S2 algorithm to the theory of nonparametric active
learning. In particular, we show that S2 achieves near minimax optimal excess
risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory
(COLT) 201
Improving Function Coverage with Munch: A Hybrid Fuzzing and Directed Symbolic Execution Approach
Fuzzing and symbolic execution are popular techniques for finding
vulnerabilities and generating test-cases for programs. Fuzzing, a blackbox
method that mutates seed input values, is generally incapable of generating
diverse inputs that exercise all paths in the program. Due to the
path-explosion problem and dependence on SMT solvers, symbolic execution may
also not achieve high path coverage. A hybrid technique involving fuzzing and
symbolic execution may achieve better function coverage than fuzzing or
symbolic execution alone. In this paper, we present Munch, an open source
framework implementing two hybrid techniques based on fuzzing and symbolic
execution. We empirically show using nine large open-source programs that
overall, Munch achieves higher (in-depth) function coverage than symbolic
execution or fuzzing alone. Using metrics based on total analyses time and
number of queries issued to the SMT solver, we also show that Munch is more
efficient at achieving better function coverage.Comment: To appear at 33rd ACM/SIGAPP Symposium On Applied Computing (SAC). To
be held from 9th to 13th April, 201
Dynamic Time-Dependent Route Planning in Road Networks with User Preferences
There has been tremendous progress in algorithmic methods for computing
driving directions on road networks. Most of that work focuses on
time-independent route planning, where it is assumed that the cost on each arc
is constant per query. In practice, the current traffic situation significantly
influences the travel time on large parts of the road network, and it changes
over the day. One can distinguish between traffic congestion that can be
predicted using historical traffic data, and congestion due to unpredictable
events, e.g., accidents. In this work, we study the \emph{dynamic and
time-dependent} route planning problem, which takes both prediction (based on
historical data) and live traffic into account. To this end, we propose a
practical algorithm that, while robust to user preferences, is able to
integrate global changes of the time-dependent metric~(e.g., due to traffic
updates or user restrictions) faster than previous approaches, while allowing
subsequent queries that enable interactive applications
- …