31,028 research outputs found
Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow
Dense visual correspondence plays a vital role in robotic perception. This
work focuses on establishing the dense correspondence between a pair of images
that captures dynamic scenes undergoing substantial transformations. We
introduce Doduo to learn general dense visual correspondence from in-the-wild
images and videos without ground truth supervision. Given a pair of images, it
estimates the dense flow field encoding the displacement of each pixel in one
image to its corresponding pixel in the other image. Doduo uses flow-based
warping to acquire supervisory signals for the training. Incorporating semantic
priors with self-supervised flow training, Doduo produces accurate dense
correspondence robust to the dynamic changes of the scenes. Trained on an
in-the-wild video dataset, Doduo illustrates superior performance on
point-level correspondence estimation over existing self-supervised
correspondence learning baselines. We also apply Doduo to articulation
estimation and zero-shot goal-conditioned manipulation, underlining its
practical applications in robotics. Code and additional visualizations are
available at https://ut-austin-rpl.github.io/DoduoComment: Project website: https://ut-austin-rpl.github.io/Dodu
GPS-GLASS: Learning Nighttime Semantic Segmentation Using Daytime Video and GPS data
Semantic segmentation for autonomous driving should be robust against various
in-the-wild environments. Nighttime semantic segmentation is especially
challenging due to a lack of annotated nighttime images and a large domain gap
from daytime images with sufficient annotation. In this paper, we propose a
novel GPS-based training framework for nighttime semantic segmentation. Given
GPS-aligned pairs of daytime and nighttime images, we perform cross-domain
correspondence matching to obtain pixel-level pseudo supervision. Moreover, we
conduct flow estimation between daytime video frames and apply GPS-based
scaling to acquire another pixel-level pseudo supervision. Using these pseudo
supervisions with a confidence map, we train a nighttime semantic segmentation
network without any annotation from nighttime images. Experimental results
demonstrate the effectiveness of the proposed method on several nighttime
semantic segmentation datasets. Our source code is available at
https://github.com/jimmy9704/GPS-GLASS.Comment: ICCVW 202
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
Techniques for dense semantic correspondence have provided limited ability to
deal with the geometric variations that commonly exist between semantically
similar images. While variations due to scale and rotation have been examined,
there lack practical solutions for more complex deformations such as affine
transformations because of the tremendous size of the associated solution
space. To address this problem, we present a discrete-continuous transformation
matching (DCTM) framework where dense affine transformation fields are inferred
through a discrete label optimization in which the labels are iteratively
updated via continuous regularization. In this way, our approach draws
solutions from the continuous space of affine transformations in a manner that
can be computed efficiently through constant-time edge-aware filtering and a
proposed affine-varying CNN-based descriptor. Experimental results show that
this model outperforms the state-of-the-art methods for dense semantic
correspondence on various benchmarks
Proposal Flow
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout.~Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that proposal flow can effectively be
transformed into a conventional dense flow field. We introduce a new dataset
that can be used to evaluate both general semantic flow techniques and
region-based approaches such as proposal flow. We use this benchmark to compare
different matching algorithms, object proposals, and region features within
proposal flow, to the state of the art in semantic flow. This comparison, along
with experiments on standard datasets, demonstrates that proposal flow
significantly outperforms existing semantic flow methods in various settings
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
- …