Search CORE

112 research outputs found

Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories

Author: Fang Zhaoyuan
Fragkiadaki Katerina
Harley Adam W.
Publication venue
Publication date: 08/04/2022
Field of study

Tracking pixels in videos is typically studied as an optical flow estimation problem, where every pixel is described with a displacement vector that locates it in the next frame. Even though wider temporal context is freely available, prior efforts to take this into account have yielded only small gains over 2-frame methods. In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames. We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking, such as dense cost maps, iterative optimization, and learned appearance updates. We train our models using long-range amodal point trajectories mined from existing optical flow datasets that we synthetically augment with occlusions. We test our approach in trajectory estimation benchmarks and in keypoint label propagation tasks, and compare favorably against state-of-the-art optical flow and feature tracking methods

arXiv.org e-Print Archive

Analogy-Forming Transformers for Few-Shot 3D Parsing

Author: Fang Zhaoyuan
Fragkiadaki Katerina
Gkanatsios Nikolaos
Singh Mayank
Tulsiani Shubham
Publication venue
Publication date: 30/05/2023
Field of study

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism. By conditioning on more than one retrieved memories, compositions of structures are predicted, that mix and match parts across the retrieved memories. One-shot, few-shot or many-shot learning are treated uniformly in Analogical Networks, by conditioning on the appropriate set of memories, whether taken from a single, few or many memory exemplars, and inferring analogous parses. We show Analogical Networks are competitive with state-of-the-art 3D segmentation transformers in many-shot settings, and outperform them, as well as existing paradigms of meta-learning and few-shot learning, in few-shot settings. Analogical Networks successfully segment instances of novel object categories simply by expanding their memory, without any weight updates. Our code and models are publicly available in the project webpage: http://analogicalnets.github.io/.Comment: ICLR 202

arXiv.org e-Print Archive

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Author: Ambrus Rares
Fang Zhaoyuan
Fragkiadaki Katerina
Harley Adam W.
Li Jie
Publication venue
Publication date: 29/09/2022
Field of study

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect -- even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform

arXiv.org e-Print Archive

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

Author: Fang Zhaoyuan
Fragkiadaki Katerina
Gupta Saurabh
Harley Adam W.
Sarch Gabriel
Schydlo Paul
Tarr Michael J.
Publication venue
Publication date: 21/07/2022
Field of study

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors. TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects. Commonsense priors are encoded in three modules: i) visuo-semantic detectors that detect out-of-place objects, ii) an associative neural graph memory of objects and spatial relations that proposes plausible semantic receptacles and surfaces for object repositions, and iii) a visual search network that guides the agent's exploration for efficiently localizing the receptacle-of-interest in the current scene to reposition the object. We test TIDEE on tidying up disorganized scenes in the AI2THOR simulation environment. TIDEE carries out the task directly from pixel and raw depth input without ever having observed the same room beforehand, relying only on priors learned from a separate set of training houses. Human evaluations on the resulting room reorganizations show TIDEE outperforms ablative versions of the model that do not use one or more of the commonsense priors. On a related room rearrangement benchmark that allows the agent to view the goal state prior to rearrangement, a simplified version of our model significantly outperforms a top-performing method by a large margin. Code and data are available at the project website: https://tidee-agent.github.io/

arXiv.org e-Print Archive

Recommended from our members

Transdifferentiation of lung adenocarcinoma in mice with Lkb1 deficiency to squamous cell carcinoma

Author: Chen Haiquan
Fang Jing
Fang Rong
Fang Zhaoyuan
Gao Yijun
Ge Gaoxiang
Han Xiangkun
Hou Yingyong
Ji Hongbin
Li Fei
Li Fuming
Li Li
Ma Huimin
Sun Yihua
Wang Hongda
Wong Kwok-kin
Xiao Qian
Yao Shun
Zhang Lei
Zhang Wenjing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/03/2014
Field of study

Lineage transition in adenocarcinoma (ADC) and squamous cell carcinoma (SCC) of non-small cell lung cancer, as implicated by clinical observation of mixed ADC and SCC pathologies in adenosquamous cell carcinoma, remains a fundamental yet unsolved question. Here we provide in vivo evidence showing the transdifferentiation of lung cancer from ADC to SCC in mice: Lkb1-deficient lung ADC progressively transdifferentiates into SCC, via a pathologically mixed mAd-SCC intermediate. We find that reduction of lysyl oxidase (Lox) in Lkb1-deficient lung ADC decreases collagen disposition and triggers extracellular matrix remodelling and upregulates p63 expression, a SCC lineage survival oncogene. Pharmacological Lox inhibition promotes the transdifferentiation, whereas ectopic Lox expression significantly inhibits this process. Notably, ADC and SCC show differential responses to Lox inhibition. Collectively, our findings demonstrate the de novo transdifferentiation of lung ADC to SCC in mice and provide mechanistic insight that may have important implications for lung cancer treatment

Harvard University - DASH

Iris Liveness Detection Competition (LivDet-Iris) -- The 2020 Edition

Author: Boutros Fadi
Bowyer Kevin
Boyd Aidan
Chen Cunjian
Czajka Adam
Damer Naser
Das Priyanka
Fang Meiling
Fang Zhaoyuan
Gonzalez Sebastian
Jang Ganghee
Kuijper Arjan
Maciejewicz Piotr
Marcel Sébastien
McGrath Joseph
Mohammadi Amir
Purnapatra Sandip
Ross Arun
Schuckers Stephanie
Sharma Renu
Tapia Juan
Trokielewicz Mateusz
Yambay David
Publication venue
Publication date: 01/01/2020
Field of study

Launched in 2013, LivDet-Iris is an international competition series open to academia and industry with the aim to assess and report advances in iris Presentation Attack Detection (PAD). This paper presents results from the fourth competition of the series: LivDet-Iris 2020. This year's competition introduced several novel elements: (a) incorporated new types of attacks (samples displayed on a screen, cadaver eyes and prosthetic eyes), (b) initiated LivDet-Iris as an on-going effort, with a testing protocol available now to everyone via the Biometrics Evaluation and Testing (BEAT)(https://www.idiap.ch/software/beat/) open-source platform to facilitate reproducibility and benchmarking of new algorithms continuously, and (c) performance comparison of the submitted entries with three baseline methods (offered by the University of Notre Dame and Michigan State University), and three open-source iris PAD methods available in the public domain. The best performing entry to the competition reported a weighted average APCER of 59.10\% and a BPCER of 0.46\% over all five attack types. This paper serves as the latest evaluation of iris PAD on a large spectrum of presentation attack instruments.Comment: 9 pages, 3 figures, 3 tables, Accepted for presentation at International Joint Conference on Biometrics (IJCB 2020

arXiv.org e-Print Archive

TUbiblio

Crossref