7,854 research outputs found
SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
The Internet has enabled the creation of a growing number of large-scale
knowledge bases in a variety of domains containing complementary information.
Tools for automatically aligning these knowledge bases would make it possible
to unify many sources of structured knowledge and answer complex queries.
However, the efficient alignment of large-scale knowledge bases still poses a
considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a
simple algorithm for aligning knowledge bases with millions of entities and
facts. SiGMa is an iterative propagation algorithm which leverages both the
structural information from the relationship graph as well as flexible
similarity measures between entity properties in a greedy local search, thus
making it scalable. Despite its greedy nature, our experiments indicate that
SiGMa can efficiently match some of the world's largest knowledge bases with
high precision. We provide additional experiments on benchmark datasets which
demonstrate that SiGMa can outperform state-of-the-art approaches both in
accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin
Unsupervised Learning from Narrated Instruction Videos
We address the problem of automatically learning the main steps to complete a
certain task, such as changing a car tire, from a set of narrated instruction
videos. The contributions of this paper are three-fold. First, we develop a new
unsupervised learning approach that takes advantage of the complementary nature
of the input video and the associated narration. The method solves two
clustering problems, one in text and one in video, applied one after each other
and linked by joint constraints to obtain a single coherent sequence of steps
in both modalities. Second, we collect and annotate a new challenging dataset
of real-world instruction videos from the Internet. The dataset contains about
800,000 frames for five different tasks that include complex interactions
between people and objects, and are captured in a variety of indoor and outdoor
settings. Third, we experimentally demonstrate that the proposed method can
automatically discover, in an unsupervised manner, the main steps to achieve
the task and locate the steps in the input videos.Comment: Appears in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2016). 21 page
On Correcting Inputs: Inverse Optimization for Online Structured Prediction
Algorithm designers typically assume that the input data is correct, and then
proceed to find "optimal" or "sub-optimal" solutions using this input data.
However this assumption of correct data does not always hold in practice,
especially in the context of online learning systems where the objective is to
learn appropriate feature weights given some training samples. Such scenarios
necessitate the study of inverse optimization problems where one is given an
input instance as well as a desired output and the task is to adjust the input
data so that the given output is indeed optimal. Motivated by learning
structured prediction models, in this paper we consider inverse optimization
with a margin, i.e., we require the given output to be better than all other
feasible outputs by a desired margin. We consider such inverse optimization
problems for maximum weight matroid basis, matroid intersection, perfect
matchings, minimum cost maximum flows, and shortest paths and derive the first
known results for such problems with a non-zero margin. The effectiveness of
these algorithmic approaches to online learning for structured prediction is
also discussed.Comment: Conference version to appear in FSTTCS, 201
Submodular Function Maximization for Group Elevator Scheduling
We propose a novel approach for group elevator scheduling by formulating it
as the maximization of submodular function under a matroid constraint. In
particular, we propose to model the total waiting time of passengers using a
quadratic Boolean function. The unary and pairwise terms in the function denote
the waiting time for single and pairwise allocation of passengers to elevators,
respectively. We show that this objective function is submodular. The matroid
constraints ensure that every passenger is allocated to exactly one elevator.
We use a greedy algorithm to maximize the submodular objective function, and
derive provable guarantees on the optimality of the solution. We tested our
algorithm using Elevate 8, a commercial-grade elevator simulator that allows
simulation with a wide range of elevator settings. We achieve significant
improvement over the existing algorithms.Comment: 10 pages; 2017 International Conference on Automated Planning and
Scheduling (ICAPS
Computing Similarity between a Pair of Trajectories
With recent advances in sensing and tracking technology, trajectory data is
becoming increasingly pervasive and analysis of trajectory data is becoming
exceedingly important. A fundamental problem in analyzing trajectory data is
that of identifying common patterns between pairs or among groups of
trajectories. In this paper, we consider the problem of identifying similar
portions between a pair of trajectories, each observed as a sequence of points
sampled from it.
We present new measures of trajectory similarity --- both local and global
--- between a pair of trajectories to distinguish between similar and
dissimilar portions. Our model is robust under noise and outliers, it does not
make any assumptions on the sampling rates on either trajectory, and it works
even if they are partially observed. Additionally, the model also yields a
scalar similarity score which can be used to rank multiple pairs of
trajectories according to similarity, e.g. in clustering applications. We also
present efficient algorithms for computing the similarity under our measures;
the worst-case running time is quadratic in the number of sample points.
Finally, we present an extensive experimental study evaluating the
effectiveness of our approach on real datasets, comparing with it with earlier
approaches, and illustrating many issues that arise in trajectory data. Our
experiments show that our approach is highly accurate in distinguishing similar
and dissimilar portions as compared to earlier methods even with sparse
sampling
- …