112 research outputs found
Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories
Tracking pixels in videos is typically studied as an optical flow estimation
problem, where every pixel is described with a displacement vector that locates
it in the next frame. Even though wider temporal context is freely available,
prior efforts to take this into account have yielded only small gains over
2-frame methods. In this paper, we revisit Sand and Teller's "particle video"
approach, and study pixel tracking as a long-range motion estimation problem,
where every pixel is described with a trajectory that locates it in multiple
future frames. We re-build this classic approach using components that drive
the current state-of-the-art in flow and object tracking, such as dense cost
maps, iterative optimization, and learned appearance updates. We train our
models using long-range amodal point trajectories mined from existing optical
flow datasets that we synthetically augment with occlusions. We test our
approach in trajectory estimation benchmarks and in keypoint label propagation
tasks, and compare favorably against state-of-the-art optical flow and feature
tracking methods
Analogy-Forming Transformers for Few-Shot 3D Parsing
We present Analogical Networks, a model that encodes domain knowledge
explicitly, in a collection of structured labelled 3D scenes, in addition to
implicitly, as model parameters, and segments 3D object scenes with analogical
reasoning: instead of mapping a scene to part segments directly, our model
first retrieves related scenes from memory and their corresponding part
structures, and then predicts analogous part structures for the input scene,
via an end-to-end learnable modulation mechanism. By conditioning on more than
one retrieved memories, compositions of structures are predicted, that mix and
match parts across the retrieved memories. One-shot, few-shot or many-shot
learning are treated uniformly in Analogical Networks, by conditioning on the
appropriate set of memories, whether taken from a single, few or many memory
exemplars, and inferring analogous parses. We show Analogical Networks are
competitive with state-of-the-art 3D segmentation transformers in many-shot
settings, and outperform them, as well as existing paradigms of meta-learning
and few-shot learning, in few-shot settings. Analogical Networks successfully
segment instances of novel object categories simply by expanding their memory,
without any weight updates. Our code and models are publicly available in the
project webpage: http://analogicalnets.github.io/.Comment: ICLR 202
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
Building 3D perception systems for autonomous vehicles that do not rely on
high-density LiDAR is a critical research problem because of the expense of
LiDAR systems compared to cameras and other sensors. Recent research has
developed a variety of camera-only methods, where features are differentiably
"lifted" from the multi-camera images onto the 2D ground plane, yielding a
"bird's eye view" (BEV) feature representation of the 3D space around the
vehicle. This line of work has produced a variety of novel "lifting" methods,
but we observe that other details in the training setups have shifted at the
same time, making it unclear what really matters in top-performing methods. We
also observe that using cameras alone is not a real-world constraint,
considering that additional sensors like radar have been integrated into real
vehicles for years already. In this paper, we first of all attempt to elucidate
the high-impact factors in the design and training protocol of BEV perception
models. We find that batch size and input resolution greatly affect
performance, while lifting strategies have a more modest effect -- even a
simple parameter-free lifter works well. Second, we demonstrate that radar data
can provide a substantial boost to performance, helping to close the gap
between camera-only and LiDAR-enabled systems. We analyze the radar usage
details that lead to good performance, and invite the community to re-consider
this commonly-neglected part of the sensor platform
TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
We introduce TIDEE, an embodied agent that tidies up a disordered scene based
on learned commonsense object placement and room arrangement priors. TIDEE
explores a home environment, detects objects that are out of their natural
place, infers plausible object contexts for them, localizes such contexts in
the current scene, and repositions the objects. Commonsense priors are encoded
in three modules: i) visuo-semantic detectors that detect out-of-place objects,
ii) an associative neural graph memory of objects and spatial relations that
proposes plausible semantic receptacles and surfaces for object repositions,
and iii) a visual search network that guides the agent's exploration for
efficiently localizing the receptacle-of-interest in the current scene to
reposition the object. We test TIDEE on tidying up disorganized scenes in the
AI2THOR simulation environment. TIDEE carries out the task directly from pixel
and raw depth input without ever having observed the same room beforehand,
relying only on priors learned from a separate set of training houses. Human
evaluations on the resulting room reorganizations show TIDEE outperforms
ablative versions of the model that do not use one or more of the commonsense
priors. On a related room rearrangement benchmark that allows the agent to view
the goal state prior to rearrangement, a simplified version of our model
significantly outperforms a top-performing method by a large margin. Code and
data are available at the project website: https://tidee-agent.github.io/
Recommended from our members
Transdifferentiation of lung adenocarcinoma in mice with Lkb1 deficiency to squamous cell carcinoma
Lineage transition in adenocarcinoma (ADC) and squamous cell carcinoma (SCC) of non-small cell lung cancer, as implicated by clinical observation of mixed ADC and SCC pathologies in adenosquamous cell carcinoma, remains a fundamental yet unsolved question. Here we provide in vivo evidence showing the transdifferentiation of lung cancer from ADC to SCC in mice: Lkb1-deficient lung ADC progressively transdifferentiates into SCC, via a pathologically mixed mAd-SCC intermediate. We find that reduction of lysyl oxidase (Lox) in Lkb1-deficient lung ADC decreases collagen disposition and triggers extracellular matrix remodelling and upregulates p63 expression, a SCC lineage survival oncogene. Pharmacological Lox inhibition promotes the transdifferentiation, whereas ectopic Lox expression significantly inhibits this process. Notably, ADC and SCC show differential responses to Lox inhibition. Collectively, our findings demonstrate the de novo transdifferentiation of lung ADC to SCC in mice and provide mechanistic insight that may have important implications for lung cancer treatment
Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition
Launched in 2013, LivDet-Iris is an international competition series open to
academia and industry with the aim to assess and report advances in iris
Presentation Attack Detection (PAD). This paper presents results from the
fourth competition of the series: LivDet-Iris 2020. This year's competition
introduced several novel elements: (a) incorporated new types of attacks
(samples displayed on a screen, cadaver eyes and prosthetic eyes), (b)
initiated LivDet-Iris as an on-going effort, with a testing protocol available
now to everyone via the Biometrics Evaluation and Testing
(BEAT)(https://www.idiap.ch/software/beat/) open-source platform to facilitate
reproducibility and benchmarking of new algorithms continuously, and (c)
performance comparison of the submitted entries with three baseline methods
(offered by the University of Notre Dame and Michigan State University), and
three open-source iris PAD methods available in the public domain. The best
performing entry to the competition reported a weighted average APCER of
59.10\% and a BPCER of 0.46\% over all five attack types. This paper serves as
the latest evaluation of iris PAD on a large spectrum of presentation attack
instruments.Comment: 9 pages, 3 figures, 3 tables, Accepted for presentation at
International Joint Conference on Biometrics (IJCB 2020
- …