2,028 research outputs found

    Cascaded Scene Flow Prediction using Semantic Segmentation

    Full text link
    Given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. Many existing approaches use superpixels for regularization, but may predict inconsistent shapes and motions inside rigidly moving objects. We instead assume that scenes consist of foreground objects rigidly moving in front of a static background, and use semantic cues to produce pixel-accurate scene flow estimates. Our cascaded classification framework accurately models 3D scenes by iteratively refining semantic segmentation masks, stereo correspondences, 3D rigid motion estimates, and optical flow fields. We evaluate our method on the challenging KITTI autonomous driving benchmark, and show that accounting for the motion of segmented vehicles leads to state-of-the-art performance.Comment: International Conference on 3D Vision (3DV), 2017 (oral presentation

    Cross Modal Distillation for Flood Extent Mapping

    Full text link
    The increasing intensity and frequency of floods is one of the many consequences of our changing climate. In this work, we explore ML techniques that improve the flood detection module of an operational early flood warning system. Our method exploits an unlabelled dataset of paired multi-spectral and Synthetic Aperture Radar (SAR) imagery to reduce the labeling requirements of a purely supervised learning method. Prior works have used unlabelled data by creating weak labels out of them. However, from our experiments we noticed that such a model still ends up learning the label mistakes in those weak labels. Motivated by knowledge distillation and semi supervised learning, we explore the use of a teacher to train a student with the help of a small hand labelled dataset and a large unlabelled dataset. Unlike the conventional self distillation setup, we propose a cross modal distillation framework that transfers supervision from a teacher trained on richer modality (multi-spectral images) to a student model trained on SAR imagery. The trained models are then tested on the Sen1Floods11 dataset. Our model outperforms the Sen1Floods11 baseline model trained on the weak labeled SAR imagery by an absolute margin of 6.53% Intersection-over-Union (IoU) on the test split

    Towards Label-free Scene Understanding by Vision Foundation Models

    Full text link
    Vision foundation models such as Contrastive Vision-Language Pre-training (CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot performance on image classification and segmentation tasks. However, the incorporation of CLIP and SAM for label-free scene understanding has yet to be explored. In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. The primary challenge lies in effectively supervising networks under extremely noisy pseudo labels, which are generated by CLIP and further exacerbated during the propagation from the 2D to the 3D domain. To tackle these challenges, we propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously. In particular, we introduce a prediction consistency regularization to co-train 2D and 3D networks, then further impose the networks' latent space consistency using the SAM's robust feature representation. Experiments conducted on diverse indoor and outdoor datasets demonstrate the superior performance of our method in understanding 2D and 3D open environments. Our 2D and 3D network achieves label-free semantic segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%, respectively. And for nuScenes dataset, our performance is 26.8% with an improvement of 6%. Code will be released (https://github.com/runnanchen/Label-Free-Scene-Understanding)

    Facial micro-expression recognition with noisy labels

    Get PDF
    Abstract. Facial micro-expressions are quick, involuntary and low intensity facial movements. An interest in detecting and recognizing micro-expressions arises from the fact that they are able to show person’s genuine hidden emotions. The small and rapid facial muscle movements are often too difficult for a human to not only spot the occurring micro-expression but also be able to recognize the emotion correctly. Recently, a focus on developing better micro-expression recognition methods has been on models and architectures. However, we take a step back and go to the root of task, the data. We thoroughly analyze the input data and notice that some of the data is noisy and possibly mislabelled. The authors of the micro-expression datasets have also acknowledged the possible problems in data labelling. Despite this, no attempts have been made to design models that take into account the potential mislabelled data in micro-expression recognition, to our best knowledge. In this thesis, we explore new methods taking noisy labels into special account in an attempt to solve the problem. We propose a simple, yet efficient label refurbishing method and a data cleaning method for handling noisy labels. We show through both quantitative and qualitative analysis the effectiveness of the methods for detecting noisy samples. The data cleaning method achieves state-of-the-art results reaching an F1-score of 0.77 in the MEGC2019 composite dataset. Further, we analyze and discuss the results in-depth and suggest future works based on our findings.Kasvojen mikroilmeiden tunnistus kohinaisilla luokilla. Tiivistelmä. Kasvojen mikroilmeet ovat nopeita, tahattomia ja pienen intensiteetin omaavia kasvojen liikkeitä. Kiinnostus mikroilmeiden tunnistamisesta johtuu niiden kyvystä paljastaa henkilöiden todelliset piilotetut tunteet. Pienet ja nopeat kasvojen lihasten liikkeet eivät olet pelkästään vaikeita huomata, mutta oikean tunteen tunnistaminen on erittäin vaikeaa. Lähiaikoina mikroilmetunnistusjärjestelmien kehitys on painottunut malleihin ja arkkitehtuureihin. Me kuitenkin otamme askeleen taaksepäin tästä kehitystyylistä ja menemme ongelman juureen eli dataan. Me tarkastamme käytettävän datan huolellisesti ja huomaamme, että osa datasta on kohinaista ja mahdollisesti väärin kategorisoitu. Mikroilmetietokantojen tekijät ovat myös myöntäneet mahdolliset ongelmat datan kategorisoinnissa. Tästä huolimatta meidän parhaan tietomme mukaan mikroilmeiden tunnistukseen ei ole kehitetty malleja, jotka huomioisivat mahdollisesti väärin kategorisoituja näytteitä. Tässä työssä tutkimme uusia malleja ottaen virheellisesti kategorisoidut näytteet erityisesti huomioon. Ehdotamme yksinkertaista, mutta tehokasta oikaisu menetelmää ja datan puhdistus menetelmää kohinaisia luokkia varten. Näytämme sekä kvantiviisisesti että kvalitatiivisesti menetelmien tehokkuuden kohinaisten näytteiden havaitsemisessa. Datan puhdistus menetelmä saavuttaa huippuluokan tuloksen, saaden F1-arvon 0.77 MEGC2019 tietokannassa. Lisäksi analysoimme ja pohdimme tuloksia syvällisesti ja ehdotamme tutkimuksia tulevaisuuteen tuloksistamme

    The Regularized Iteratively Reweighted MAD Method for Change Detection in Multi- and Hyperspectral Data

    Get PDF

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster
    corecore