72 research outputs found
DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency
We present an unsupervised learning framework for simultaneously training
single-view depth prediction and optical flow estimation models using unlabeled
video sequences. Existing unsupervised methods often exploit brightness
constancy and spatial smoothness priors to train depth or flow models. In this
paper, we propose to leverage geometric consistency as additional supervisory
signals. Our core idea is that for rigid regions we can use the predicted scene
depth and camera motion to synthesize 2D optical flow by backprojecting the
induced 3D scene flow. The discrepancy between the rigid flow (from depth
prediction and camera motion) and the estimated flow (from optical flow model)
allows us to impose a cross-task consistency loss. While all the networks are
jointly optimized during training, they can be applied independently at test
time. Extensive experiments demonstrate that our depth and flow models compare
favorably with state-of-the-art unsupervised methods.Comment: ECCV 2018. Project website: http://yuliang.vision/DF-Net/ Code:
https://github.com/vt-vl-lab/DF-Ne
Label Efficient Learning of Transferable Representations across Domains and Tasks
We propose a framework that learns a representation transferable across
different domains and tasks in a label efficient manner. Our approach battles
domain shift with a domain adversarial loss, and generalizes the embedding to
novel task using a metric learning-based approach. Our model is simultaneously
optimized on labeled source data and unlabeled or sparsely labeled data in the
target domain. Our method shows compelling results on novel classes within a
new domain even when only a few labeled examples per class are available,
outperforming the prevalent fine-tuning approach. In addition, we demonstrate
the effectiveness of our framework on the transfer learning task from image
object recognition to video action recognition.Comment: NIPS 201
Learning to Generate Long-term Future via Hierarchical Prediction
We propose a hierarchical approach for making long-term predictions of future
frames. To avoid inherent compounding errors in recursive pixel-level
prediction, we propose to first estimate high-level structure in the input
frames, then predict how that structure evolves in the future, and finally by
observing a single frame from the past and the predicted high-level structure,
we construct the future frames without having to observe any of the pixel-level
predictions. Long-term video prediction is difficult to perform by recurrently
observing the predicted frames because the small errors in pixel space
exponentially amplify as predictions are made deeper into the future. Our
approach prevents pixel-level error propagation from happening by removing the
need to observe the predicted frames. Our model is built with a combination of
LSTM and analogy based encoder-decoder convolutional neural networks, which
independently predict the video structure and generate the future frames,
respectively. In experiments, our model is evaluated on the Human3.6M and Penn
Action datasets on the task of long-term pixel-level video prediction of humans
performing actions and demonstrate significantly better results than the
state-of-the-art.Comment: International Conference on Machine Learning (ICML) 201
iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
Recent years have witnessed rapid progress in detecting and recognizing
individual object instances. To understand the situation in a scene, however,
computers need to recognize how humans interact with surrounding objects. In
this paper, we tackle the challenging task of detecting human-object
interactions (HOI). Our core idea is that the appearance of a person or an
object instance contains informative cues on which relevant parts of an image
to attend to for facilitating interaction prediction. To exploit these cues, we
propose an instance-centric attention module that learns to dynamically
highlight regions in an image conditioned on the appearance of each instance.
Such an attention-based network allows us to selectively aggregate features
relevant for recognizing HOIs. We validate the efficacy of the proposed network
on the Verb in COCO and HICO-DET datasets and show that our approach compares
favorably with the state-of-the-arts.Comment: BMVC 2018. Project webpage: https://gaochen315.github.io/iCAN/ Code:
https://github.com/vt-vl-lab/iCA
Visualizing an adjustable WO3/p-GaN heterojunction
The p-n junctions based on typical semiconductors are the elementary units
for the modern electronic devices and chip industry. While the rectification
property of those p-n junction is usually fixed once the unit is fabricated.
Here, we proposed an adjustable n-WO3/p-GaN heterojunction with controllable
electronic properties. For the prepared n-WO3/p-GaN heterojunction, it is
almost transparent and shows typical p-n junction rectification. While if
gradually doping some hydrogen atoms into WO3 layer by a facile electron-proton
synergistic route, the heterojunction can be turned dynamically from the
typical p-n junction (n-WO3/p-GaN) to standard Schottky contact (HxWO3/p-GaN)
step by step. More importantly, this evolution can be directly visualized by
eyesight due to the pronounced electrochromic characteristic of WO3 layer. By
connecting two HxWO3/p-GaN heterojunctions, the controllable bi-functional
rectification can be achieved. In addition, the HxWO3/p-GaN heterojunction can
recovered to the original p-n jucntion just by annealing at ambient,
demonstrating the heterojunction is controllable and reusable. The current
study will open up tremendous opportunities for dynamic electronic devices in
the future.Comment: 8 pages, 2 figure
DRG: Dual Relation Graph for Human-Object Interaction Detection
We tackle the challenging problem of human-object interaction (HOI)
detection. Existing methods either recognize the interaction of each
human-object pair in isolation or perform joint inference based on complex
appearance-based features. In this paper, we leverage an abstract
spatial-semantic representation to describe each human-object pair and
aggregate the contextual information of the scene via a dual relation graph
(one human-centric and one object-centric). Our proposed dual relation graph
effectively captures discriminative cues from the scene to resolve ambiguity
from local predictions. Our model is conceptually simple and leads to favorable
results compared to the state-of-the-art HOI detection algorithms on two
large-scale benchmark datasets.Comment: ECCV 2020. Project: http://chengao.vision/DRG/ Code:
https://github.com/vt-vl-lab/DR
Gate-Controlled VO2 Phase Transition for High-Performance Smart Window
VO2 material is promising for developing energy-saving "smart window", owing
to its thermochromic property induced by metal-insulator transition (MIT).
However, its practical application is greatly limited by the relatively high
critical transition temperature (~68oC), low luminous transmittance (<60%) and
poor solar energy regulation ability (<15%). Here we developed a reversible and
non-volatile electric-field control on the MIT of monoclinic VO2 film. With a
solid electrolyte layer assisted gating treatment, we modulated the
insertion/extraction of hydrogens into/from VO2 lattice at room temperature,
causing tri-state phase transitions accompanied with controllable transmission
adjustment. The dramatic increase of visible/infrared transmittance during the
phase transition from the metallic (lightly H-doping) to insulating (heavily
H-doping) phase leads to an increased solar energy regulation ability up to
26.5%, while keep 70.8% visible-luminous transmittance. These results beat all
previous records and even exceeded the theoretical limit for traditional VO2
smart window, removing intrinsic disadvantages of VO2 for energy-saving
utilizations. Our findings not only demonstrated an electric-field controlled
phase modulation strategy, but also open the door for high-performance
VO2-based smart window applications.Comment: 19 pages, 5 figure
Spatially-resolved insulator-metal transition for rewritable optical gratings
Doping is an effective way to tune the property of metal oxides1-5, for
achieving functional oxide electronics6-8. Previously we developed a
controllable hydrogen doping technology at ambient conditions by use of
electron-proton synergistic doping strategy, which enables one to get rid of
high-temperature/pressure treatments required by traditional technologies9.
Here, based on this facile doping route, we achieve a visual and reversible
insulator-metal transition (MIT) for tungsten trioxide (WO3) film. Its
outstanding spatial selection is comparable to standard UV lithography, which
shows the potential of becoming a viable way for rewritable WO3 grating device
fabrication. Furthermore, the period of the obtained WO3 structural grating can
also be easily changed for requirement by doping area selection. This advanced
doping technology opens up alternative approaches for developing not only
optical devices, but also rewritable ions devices and integrated circuits for
various oxide electronics.Comment: 16 pages, 4 figure
Electron-proton Co-doping Induced Metal-insulator Transition in VO2 Film via Surface Self-assembled Ascorbic Acid Molecules
Charge doping is an effective way to induce metal-insulate transition (MIT)
in correlated materials for many important utilizations, which is however
practically limited by problem of low stability. In this study, we have
achieved pronounced phase modulation and stabilized the metallic state of
monoclinic vanadium dioxide (VO2) at room temperature, via a novel
electron-proton co-doping mechanism driven by surface absorption of
self-assembled L-ascorbic acid (AA) molecules. The ionized AA- species in
solution donate effective electrons to the adsorbed VO2 surface, which then
electrostatically attract surrounding protons to penetrate, and eventually
results in stable hydrogen-doped metallic VO2. The variations of phase and
electronic structures as well as the electron occupancy of V-3d/O-2p hybrid
orbitals were examined by synchrotron characterizations and first-principle
theoretical simulations, which explain the formation of stable metallic state.
Importantly, the adsorbed molecules protect hydrogen dopants from escaping out
of lattice and thereby stabilize the metallic phase for VO2. Such an
electron-proton co-doping mechanism driven by suitable molecules absorption
would open a new door for engineering properties of correlated oxide materials.Comment: 25 pages, 5 figure
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
Monocular visual odometry (VO) suffers severely from error accumulation
during frame-to-frame pose estimation. In this paper, we present a
self-supervised learning method for VO with special consideration for
consistency over longer sequences. To this end, we model the long-term
dependency in pose prediction using a pose network that features a two-layer
convolutional LSTM module. We train the networks with purely self-supervised
losses, including a cycle consistency loss that mimics the loop closure module
in geometric VO. Inspired by prior geometric systems, we allow the networks to
see beyond a small temporal window during training, through a novel a loss that
incorporates temporally distant (e.g., O(100)) frames. Given GPU memory
constraints, we propose a stage-wise training mechanism, where the first stage
operates in a local time window and the second stage refines the poses with a
"global" loss given the first stage features. We demonstrate competitive
results on several standard VO datasets, including KITTI and TUM RGB-D.Comment: ECCV 2020. Project page: https://yuliang.vision/LTMV
- …