1,663 research outputs found
Adaptive Multimodal Fusion For Facial Action Units Recognition
Multimodal facial action units (AU) recognition aims to build models that are capable of processing, correlating, and integrating information from multiple modalities (i.e., 2D images from a visual sensor, 3D geometry from 3D imaging, and thermal images from an infrared sensor). Although the multimodal data can provide rich information, there are two challenges that have to be addressed when learning from multimodal data: 1) the model must capture the complex cross-modal interactions in order to utilize the additional and mutual information effectively; 2) the model must be robust enough in the circumstance of unexpected data corruptions during testing, in case of a certain modality missing or being noisy. In this paper, we propose a novel Adaptive Multimodal Fusion method (AMF) for AU detection, which learns to select the most relevant feature representations from different modalities by a re-sampling procedure conditioned on a feature scoring module. The feature scoring module is designed to allow for evaluating the quality of features learned from multiple modalities. As a result, AMF is able to adaptively select more discriminative features, thus increasing the robustness to missing or corrupted modalities. In addition, to alleviate the over-fitting problem and make the model generalize better on the testing data, a cut-switch multimodal data augmentation method is designed, by which a random block is cut and switched across multiple modalities. We have conducted a thorough investigation on two public multimodal AU datasets, BP4D and BP4D+, and the results demonstrate the effectiveness of the proposed method. Ablation studies on various circumstances also show that our method remains robust to missing or noisy modalities during tests
Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation
In this paper, we look into the problem of estimating per-pixel depth maps
from unconstrained RGB monocular night-time images which is a difficult task
that has not been addressed adequately in the literature. The state-of-the-art
day-time depth estimation methods fail miserably when tested with night-time
images due to a large domain shift between them. The usual photo metric losses
used for training these networks may not work for night-time images due to the
absence of uniform lighting which is commonly present in day-time images,
making it a difficult problem to solve. We propose to solve this problem by
posing it as a domain adaptation problem where a network trained with day-time
images is adapted to work for night-time images. Specifically, an encoder is
trained to generate features from night-time images that are indistinguishable
from those obtained from day-time images by using a PatchGAN-based adversarial
discriminative learning method. Unlike the existing methods that directly adapt
depth prediction (network output), we propose to adapt feature maps obtained
from the encoder network so that a pre-trained day-time depth decoder can be
directly used for predicting depth from these adapted features. Hence, the
resulting method is termed as "Adversarial Domain Feature Adaptation (ADFA)"
and its efficacy is demonstrated through experimentation on the challenging
Oxford night driving dataset. Also, The modular encoder-decoder architecture
for the proposed ADFA method allows us to use the encoder module as a feature
extractor which can be used in many other applications. One such application is
demonstrated where the features obtained from our adapted encoder network are
shown to outperform other state-of-the-art methods in a visual place
recognition problem, thereby, further establishing the usefulness and
effectiveness of the proposed approach.Comment: ECCV 202
KiDS-i-800: Comparing weak gravitational lensing measurements in same-sky surveys
We present a weak gravitational lensing analysis of 815 square degree of
-band imaging from the Kilo-Degree Survey (KiDS--800). In contrast to the
deep -band observations, which take priority during excellent seeing
conditions and form the primary KiDS dataset (KiDS--450), the complementary
yet shallower KiDS--800 spans a wide range of observing conditions. The
overlapping KiDS--800 and KiDS--450 imaging therefore provides a unique
opportunity to assess the robustness of weak lensing measurements. In our
analysis, we introduce two new `null' tests. The `nulled' two-point shear
correlation function uses a matched catalogue to show that the calibrated
KiDS--800 and KiDS--450 shear measurements agree at the level of \%. We use five galaxy lens samples to determine a `nulled' galaxy-galaxy
lensing signal from the full KiDS--800 and KiDS--450 surveys and find
that the measurements agree to \% when the KiDS--800 source
redshift distribution is calibrated using either spectroscopic redshifts, or
the 30-band photometric redshifts from the COSMOS survey.Comment: 24 pages, 20 figures. Submitted to MNRAS. Comments welcom
Straight to Shapes: Real-time Detection of Encoded Shapes
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects' shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories similar to the
ones in the training set, something that most existing approaches fail to
handle.Comment: 16 pages including appendix; Published at CVPR 201
Dark Matter in the Galaxy Cluster CL J1226+3332 at Z=0.89
We present a weak-lensing analysis of the galaxy cluster CL J1226+3332 at
z=0.89 using Hubble Space Telescope Advanced Camera for Surveys images. The
cluster is the hottest (>10 keV), most X-ray luminous system at z>0.6 known to
date. The relaxed X-ray morphology, as well as its high temperature, is unusual
at such a high redshift. Our mass reconstruction shows that on a large scale
the dark matter distribution is consistent with a relaxed system with no
significant substructures. However, on a small scale the cluster core is
resolved into two mass clumps highly correlated with the cluster galaxy
distribution. The dominant mass clump lies close to the brightest cluster
galaxy whereas the other less massive clump is located ~40" (~310 kpc) to the
southwest. Although this secondary mass clump does not show an excess in the
X-ray surface brightness, the gas temperature of the region is much higher
(12~18 keV) than those of the rest. We propose a scenario in which the less
massive system has already passed through the main cluster and the X-ray gas
has been stripped during this passage. The elongation of the X-ray peak toward
the southwestern mass clump is also supportive of this possibility. We measure
significant tangential shears out to the field boundary (~1.5 Mpc), which are
well described by an Navarro-Frenk-White profile with a concentration parameter
of c200=2.7+-0.3 and a scale length of rs=78"+-19" (~600 kpc) with
chi^2/d.o.f=1.11. Within the spherical volume r200=1.6 Mpc, the total mass of
the cluster becomes M(r<r200)=(1.4+-0.2) x 10^15 solar mass. Our weak-lensing
analysis confirms that CL1226+3332 is indeed the most massive cluster known to
date at z>0.6.Comment: Accepted for publication in Ap
- …