72 research outputs found

    DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency

    Full text link
    We present an unsupervised learning framework for simultaneously training single-view depth prediction and optical flow estimation models using unlabeled video sequences. Existing unsupervised methods often exploit brightness constancy and spatial smoothness priors to train depth or flow models. In this paper, we propose to leverage geometric consistency as additional supervisory signals. Our core idea is that for rigid regions we can use the predicted scene depth and camera motion to synthesize 2D optical flow by backprojecting the induced 3D scene flow. The discrepancy between the rigid flow (from depth prediction and camera motion) and the estimated flow (from optical flow model) allows us to impose a cross-task consistency loss. While all the networks are jointly optimized during training, they can be applied independently at test time. Extensive experiments demonstrate that our depth and flow models compare favorably with state-of-the-art unsupervised methods.Comment: ECCV 2018. Project website: http://yuliang.vision/DF-Net/ Code: https://github.com/vt-vl-lab/DF-Ne

    Label Efficient Learning of Transferable Representations across Domains and Tasks

    Full text link
    We propose a framework that learns a representation transferable across different domains and tasks in a label efficient manner. Our approach battles domain shift with a domain adversarial loss, and generalizes the embedding to novel task using a metric learning-based approach. Our model is simultaneously optimized on labeled source data and unlabeled or sparsely labeled data in the target domain. Our method shows compelling results on novel classes within a new domain even when only a few labeled examples per class are available, outperforming the prevalent fine-tuning approach. In addition, we demonstrate the effectiveness of our framework on the transfer learning task from image object recognition to video action recognition.Comment: NIPS 201

    Learning to Generate Long-term Future via Hierarchical Prediction

    Full text link
    We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional neural networks, which independently predict the video structure and generate the future frames, respectively. In experiments, our model is evaluated on the Human3.6M and Penn Action datasets on the task of long-term pixel-level video prediction of humans performing actions and demonstrate significantly better results than the state-of-the-art.Comment: International Conference on Machine Learning (ICML) 201

    iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

    Full text link
    Recent years have witnessed rapid progress in detecting and recognizing individual object instances. To understand the situation in a scene, however, computers need to recognize how humans interact with surrounding objects. In this paper, we tackle the challenging task of detecting human-object interactions (HOI). Our core idea is that the appearance of a person or an object instance contains informative cues on which relevant parts of an image to attend to for facilitating interaction prediction. To exploit these cues, we propose an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance. Such an attention-based network allows us to selectively aggregate features relevant for recognizing HOIs. We validate the efficacy of the proposed network on the Verb in COCO and HICO-DET datasets and show that our approach compares favorably with the state-of-the-arts.Comment: BMVC 2018. Project webpage: https://gaochen315.github.io/iCAN/ Code: https://github.com/vt-vl-lab/iCA

    Visualizing an adjustable WO3/p-GaN heterojunction

    Full text link
    The p-n junctions based on typical semiconductors are the elementary units for the modern electronic devices and chip industry. While the rectification property of those p-n junction is usually fixed once the unit is fabricated. Here, we proposed an adjustable n-WO3/p-GaN heterojunction with controllable electronic properties. For the prepared n-WO3/p-GaN heterojunction, it is almost transparent and shows typical p-n junction rectification. While if gradually doping some hydrogen atoms into WO3 layer by a facile electron-proton synergistic route, the heterojunction can be turned dynamically from the typical p-n junction (n-WO3/p-GaN) to standard Schottky contact (HxWO3/p-GaN) step by step. More importantly, this evolution can be directly visualized by eyesight due to the pronounced electrochromic characteristic of WO3 layer. By connecting two HxWO3/p-GaN heterojunctions, the controllable bi-functional rectification can be achieved. In addition, the HxWO3/p-GaN heterojunction can recovered to the original p-n jucntion just by annealing at ambient, demonstrating the heterojunction is controllable and reusable. The current study will open up tremendous opportunities for dynamic electronic devices in the future.Comment: 8 pages, 2 figure

    DRG: Dual Relation Graph for Human-Object Interaction Detection

    Full text link
    We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph (one human-centric and one object-centric). Our proposed dual relation graph effectively captures discriminative cues from the scene to resolve ambiguity from local predictions. Our model is conceptually simple and leads to favorable results compared to the state-of-the-art HOI detection algorithms on two large-scale benchmark datasets.Comment: ECCV 2020. Project: http://chengao.vision/DRG/ Code: https://github.com/vt-vl-lab/DR

    Gate-Controlled VO2 Phase Transition for High-Performance Smart Window

    Full text link
    VO2 material is promising for developing energy-saving "smart window", owing to its thermochromic property induced by metal-insulator transition (MIT). However, its practical application is greatly limited by the relatively high critical transition temperature (~68oC), low luminous transmittance (<60%) and poor solar energy regulation ability (<15%). Here we developed a reversible and non-volatile electric-field control on the MIT of monoclinic VO2 film. With a solid electrolyte layer assisted gating treatment, we modulated the insertion/extraction of hydrogens into/from VO2 lattice at room temperature, causing tri-state phase transitions accompanied with controllable transmission adjustment. The dramatic increase of visible/infrared transmittance during the phase transition from the metallic (lightly H-doping) to insulating (heavily H-doping) phase leads to an increased solar energy regulation ability up to 26.5%, while keep 70.8% visible-luminous transmittance. These results beat all previous records and even exceeded the theoretical limit for traditional VO2 smart window, removing intrinsic disadvantages of VO2 for energy-saving utilizations. Our findings not only demonstrated an electric-field controlled phase modulation strategy, but also open the door for high-performance VO2-based smart window applications.Comment: 19 pages, 5 figure

    Spatially-resolved insulator-metal transition for rewritable optical gratings

    Full text link
    Doping is an effective way to tune the property of metal oxides1-5, for achieving functional oxide electronics6-8. Previously we developed a controllable hydrogen doping technology at ambient conditions by use of electron-proton synergistic doping strategy, which enables one to get rid of high-temperature/pressure treatments required by traditional technologies9. Here, based on this facile doping route, we achieve a visual and reversible insulator-metal transition (MIT) for tungsten trioxide (WO3) film. Its outstanding spatial selection is comparable to standard UV lithography, which shows the potential of becoming a viable way for rewritable WO3 grating device fabrication. Furthermore, the period of the obtained WO3 structural grating can also be easily changed for requirement by doping area selection. This advanced doping technology opens up alternative approaches for developing not only optical devices, but also rewritable ions devices and integrated circuits for various oxide electronics.Comment: 16 pages, 4 figure

    Electron-proton Co-doping Induced Metal-insulator Transition in VO2 Film via Surface Self-assembled Ascorbic Acid Molecules

    Full text link
    Charge doping is an effective way to induce metal-insulate transition (MIT) in correlated materials for many important utilizations, which is however practically limited by problem of low stability. In this study, we have achieved pronounced phase modulation and stabilized the metallic state of monoclinic vanadium dioxide (VO2) at room temperature, via a novel electron-proton co-doping mechanism driven by surface absorption of self-assembled L-ascorbic acid (AA) molecules. The ionized AA- species in solution donate effective electrons to the adsorbed VO2 surface, which then electrostatically attract surrounding protons to penetrate, and eventually results in stable hydrogen-doped metallic VO2. The variations of phase and electronic structures as well as the electron occupancy of V-3d/O-2p hybrid orbitals were examined by synchrotron characterizations and first-principle theoretical simulations, which explain the formation of stable metallic state. Importantly, the adsorbed molecules protect hydrogen dopants from escaping out of lattice and thereby stabilize the metallic phase for VO2. Such an electron-proton co-doping mechanism driven by suitable molecules absorption would open a new door for engineering properties of correlated oxide materials.Comment: 25 pages, 5 figure

    Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

    Full text link
    Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module. We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO. Inspired by prior geometric systems, we allow the networks to see beyond a small temporal window during training, through a novel a loss that incorporates temporally distant (e.g., O(100)) frames. Given GPU memory constraints, we propose a stage-wise training mechanism, where the first stage operates in a local time window and the second stage refines the poses with a "global" loss given the first stage features. We demonstrate competitive results on several standard VO datasets, including KITTI and TUM RGB-D.Comment: ECCV 2020. Project page: https://yuliang.vision/LTMV
    corecore