156,110 research outputs found
Enabling Explainable Fusion in Deep Learning with Fuzzy Integral Neural Networks
Information fusion is an essential part of numerous engineering systems and
biological functions, e.g., human cognition. Fusion occurs at many levels,
ranging from the low-level combination of signals to the high-level aggregation
of heterogeneous decision-making processes. While the last decade has witnessed
an explosion of research in deep learning, fusion in neural networks has not
observed the same revolution. Specifically, most neural fusion approaches are
ad hoc, are not understood, are distributed versus localized, and/or
explainability is low (if present at all). Herein, we prove that the fuzzy
Choquet integral (ChI), a powerful nonlinear aggregation function, can be
represented as a multi-layer network, referred to hereafter as ChIMP. We also
put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient
descent-based optimization in light of the exponential number of ChI inequality
constraints. An additional benefit of ChIMP/iChIMP is that it enables
eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP
is applied to the fusion of a set of heterogeneous architecture deep models in
remote sensing. We show an improvement in model accuracy and our previously
established XAI indices shed light on the quality of our data, model, and its
decisions.Comment: IEEE Transactions on Fuzzy System
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
Fusing LiDAR and camera information is essential for achieving accurate and
reliable 3D object detection in autonomous driving systems. However, this is
challenging due to the difficulty of combining multi-granularity geometric and
semantic features from two drastically different modalities. Recent approaches
aim at exploring the semantic densities of camera features through lifting
points in 2D camera images (referred to as seeds) into 3D space for fusion, and
they can be roughly divided into 1) early fusion of raw points that aims at
augmenting the 3D point cloud at the early input stage, and 2) late fusion of
BEV (bird-eye view) maps that merges LiDAR and camera BEV features before the
detection head. While both have their merits in enhancing the representation
power of the combined features, this single-level fusion strategy is a
suboptimal solution to the aforementioned challenge. Their major drawbacks are
the inability to interact the multi-granularity semantic features from two
distinct modalities sufficiently. To this end, we propose a novel framework
that focuses on the multi-scale progressive interaction of the
multi-granularity LiDAR and camera features. Our proposed method, abbreviated
as MDMSFusion, achieves state-of-the-art results in 3D object detection, with
69.1 mAP and 71.8 NDS on nuScenes validation set, and 70.8 mAP and 73.2 NDS on
nuScenes test set, which rank 1st and 2nd respectively among single-model
non-ensemble approaches by the time of submission
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
Biometric presentation attack detection: beyond the visible spectrum
The increased need for unattended authentication in
multiple scenarios has motivated a wide deployment of biometric
systems in the last few years. This has in turn led to the
disclosure of security concerns specifically related to biometric
systems. Among them, presentation attacks (PAs, i.e., attempts
to log into the system with a fake biometric characteristic or
presentation attack instrument) pose a severe threat to the
security of the system: any person could eventually fabricate
or order a gummy finger or face mask to impersonate someone
else. In this context, we present a novel fingerprint presentation
attack detection (PAD) scheme based on i) a new capture device
able to acquire images within the short wave infrared (SWIR)
spectrum, and i i) an in-depth analysis of several state-of-theart
techniques based on both handcrafted and deep learning
features. The approach is evaluated on a database comprising
over 4700 samples, stemming from 562 different subjects and
35 different presentation attack instrument (PAI) species. The
results show the soundness of the proposed approach with a
detection equal error rate (D-EER) as low as 1.35% even in a
realistic scenario where five different PAI species are considered
only for testing purposes (i.e., unknown attacks
LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks
In this work, a deep learning approach has been developed to carry out road
detection by fusing LIDAR point clouds and camera images. An unstructured and
sparse point cloud is first projected onto the camera image plane and then
upsampled to obtain a set of dense 2D images encoding spatial information.
Several fully convolutional neural networks (FCNs) are then trained to carry
out road detection, either by using data from a single sensor, or by using
three fusion strategies: early, late, and the newly proposed cross fusion.
Whereas in the former two fusion approaches, the integration of multimodal
information is carried out at a predefined depth level, the cross fusion FCN is
designed to directly learn from data where to integrate information; this is
accomplished by using trainable cross connections between the LIDAR and the
camera processing branches.
To further highlight the benefits of using a multimodal system for road
detection, a data set consisting of visually challenging scenes was extracted
from driving sequences of the KITTI raw data set. It was then demonstrated
that, as expected, a purely camera-based FCN severely underperforms on this
data set. A multimodal system, on the other hand, is still able to provide high
accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI
road benchmark where it achieved excellent performance, with a MaxF score of
96.03%, ranking it among the top-performing approaches
- …