1,663 research outputs found

    Adaptive Multimodal Fusion For Facial Action Units Recognition

    Get PDF
    Multimodal facial action units (AU) recognition aims to build models that are capable of processing, correlating, and integrating information from multiple modalities (i.e., 2D images from a visual sensor, 3D geometry from 3D imaging, and thermal images from an infrared sensor). Although the multimodal data can provide rich information, there are two challenges that have to be addressed when learning from multimodal data: 1) the model must capture the complex cross-modal interactions in order to utilize the additional and mutual information effectively; 2) the model must be robust enough in the circumstance of unexpected data corruptions during testing, in case of a certain modality missing or being noisy. In this paper, we propose a novel Adaptive Multimodal Fusion method (AMF) for AU detection, which learns to select the most relevant feature representations from different modalities by a re-sampling procedure conditioned on a feature scoring module. The feature scoring module is designed to allow for evaluating the quality of features learned from multiple modalities. As a result, AMF is able to adaptively select more discriminative features, thus increasing the robustness to missing or corrupted modalities. In addition, to alleviate the over-fitting problem and make the model generalize better on the testing data, a cut-switch multimodal data augmentation method is designed, by which a random block is cut and switched across multiple modalities. We have conducted a thorough investigation on two public multimodal AU datasets, BP4D and BP4D+, and the results demonstrate the effectiveness of the proposed method. Ablation studies on various circumstances also show that our method remains robust to missing or noisy modalities during tests

    Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation

    Get PDF
    In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between them. The usual photo metric losses used for training these networks may not work for night-time images due to the absence of uniform lighting which is commonly present in day-time images, making it a difficult problem to solve. We propose to solve this problem by posing it as a domain adaptation problem where a network trained with day-time images is adapted to work for night-time images. Specifically, an encoder is trained to generate features from night-time images that are indistinguishable from those obtained from day-time images by using a PatchGAN-based adversarial discriminative learning method. Unlike the existing methods that directly adapt depth prediction (network output), we propose to adapt feature maps obtained from the encoder network so that a pre-trained day-time depth decoder can be directly used for predicting depth from these adapted features. Hence, the resulting method is termed as "Adversarial Domain Feature Adaptation (ADFA)" and its efficacy is demonstrated through experimentation on the challenging Oxford night driving dataset. Also, The modular encoder-decoder architecture for the proposed ADFA method allows us to use the encoder module as a feature extractor which can be used in many other applications. One such application is demonstrated where the features obtained from our adapted encoder network are shown to outperform other state-of-the-art methods in a visual place recognition problem, thereby, further establishing the usefulness and effectiveness of the proposed approach.Comment: ECCV 202

    KiDS-i-800: Comparing weak gravitational lensing measurements in same-sky surveys

    Get PDF
    We present a weak gravitational lensing analysis of 815 square degree of ii-band imaging from the Kilo-Degree Survey (KiDS-ii-800). In contrast to the deep rr-band observations, which take priority during excellent seeing conditions and form the primary KiDS dataset (KiDS-rr-450), the complementary yet shallower KiDS-ii-800 spans a wide range of observing conditions. The overlapping KiDS-ii-800 and KiDS-rr-450 imaging therefore provides a unique opportunity to assess the robustness of weak lensing measurements. In our analysis, we introduce two new `null' tests. The `nulled' two-point shear correlation function uses a matched catalogue to show that the calibrated KiDS-ii-800 and KiDS-rr-450 shear measurements agree at the level of 1±41 \pm 4\%. We use five galaxy lens samples to determine a `nulled' galaxy-galaxy lensing signal from the full KiDS-ii-800 and KiDS-rr-450 surveys and find that the measurements agree to 7±57 \pm 5\% when the KiDS-ii-800 source redshift distribution is calibrated using either spectroscopic redshifts, or the 30-band photometric redshifts from the COSMOS survey.Comment: 24 pages, 20 figures. Submitted to MNRAS. Comments welcom

    Straight to Shapes: Real-time Detection of Encoded Shapes

    Full text link
    Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

    Dark Matter in the Galaxy Cluster CL J1226+3332 at Z=0.89

    Full text link
    We present a weak-lensing analysis of the galaxy cluster CL J1226+3332 at z=0.89 using Hubble Space Telescope Advanced Camera for Surveys images. The cluster is the hottest (>10 keV), most X-ray luminous system at z>0.6 known to date. The relaxed X-ray morphology, as well as its high temperature, is unusual at such a high redshift. Our mass reconstruction shows that on a large scale the dark matter distribution is consistent with a relaxed system with no significant substructures. However, on a small scale the cluster core is resolved into two mass clumps highly correlated with the cluster galaxy distribution. The dominant mass clump lies close to the brightest cluster galaxy whereas the other less massive clump is located ~40" (~310 kpc) to the southwest. Although this secondary mass clump does not show an excess in the X-ray surface brightness, the gas temperature of the region is much higher (12~18 keV) than those of the rest. We propose a scenario in which the less massive system has already passed through the main cluster and the X-ray gas has been stripped during this passage. The elongation of the X-ray peak toward the southwestern mass clump is also supportive of this possibility. We measure significant tangential shears out to the field boundary (~1.5 Mpc), which are well described by an Navarro-Frenk-White profile with a concentration parameter of c200=2.7+-0.3 and a scale length of rs=78"+-19" (~600 kpc) with chi^2/d.o.f=1.11. Within the spherical volume r200=1.6 Mpc, the total mass of the cluster becomes M(r<r200)=(1.4+-0.2) x 10^15 solar mass. Our weak-lensing analysis confirms that CL1226+3332 is indeed the most massive cluster known to date at z>0.6.Comment: Accepted for publication in Ap
    corecore