19,848 research outputs found
Multi-instance learning with application to the profiling of multi-victim homicides
Altres ajuts: acords transformatius de la UABHomicide involving multiple victims has a significant negative effect on society. Criminal profiling consists of determining the traits of an unknown offender based on those of the crime and the victims, with a view to their identification. To provide the most likely profile of the perpetrator of a multi-victim homicide, we propose a predictive model of supervised machine learning based on a Bayesian Network. Conventional classifiers can generate the perpetrator's profile according to the traits of each of the victims of the same homicide, but the profiles may differ from one another. To address this issue, we consider the Multi-Instance (MI) learning framework, in which the victims of the same incident form a bag, and each bag is associated with a unique label for each of the perpetrator's features. We introduce the unanimity MI assumption in this domain, and accordingly allocate a label to the bag based on the labels and probabilities the Bayesian Network has assigned its instances, using a combination rulefrom those of the ensemble of classifiers. We apply this methodology to the Federal Bureau of Investigation (FBI) homicide database to compare three combination rules empirically in the validation process, as well as theoretically, using the one that ultimately proves to be the best to build the final model, which is then applied in some illustrative examples to achieve the criminal profile. PID2021-123733NB-I0
Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification
The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks
Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects
Robotic manipulation, in particular in-hand object manipulation, often
requires an accurate estimate of the object's 6D pose. To improve the accuracy
of the estimated pose, state-of-the-art approaches in 6D object pose estimation
use observational data from one or more modalities, e.g., RGB images, depth,
and tactile readings. However, existing approaches make limited use of the
underlying geometric structure of the object captured by these modalities,
thereby, increasing their reliance on visual features. This results in poor
performance when presented with objects that lack such visual features or when
visual features are simply occluded. Furthermore, current approaches do not
take advantage of the proprioceptive information embedded in the position of
the fingers. To address these limitations, in this paper: (1) we introduce a
hierarchical graph neural network architecture for combining multimodal (vision
and touch) data that allows for a geometrically informed 6D object pose
estimation, (2) we introduce a hierarchical message passing operation that
flows the information within and across modalities to learn a graph-based
object representation, and (3) we introduce a method that accounts for the
proprioceptive information for in-hand object representation. We evaluate our
model on a diverse subset of objects from the YCB Object and Model Set, and
show that our method substantially outperforms existing state-of-the-art work
in accuracy and robustness to occlusion. We also deploy our proposed framework
on a real robot and qualitatively demonstrate successful transfer to real
settings
Machine learning in solar physics
The application of machine learning in solar physics has the potential to
greatly enhance our understanding of the complex processes that take place in
the atmosphere of the Sun. By using techniques such as deep learning, we are
now in the position to analyze large amounts of data from solar observations
and identify patterns and trends that may not have been apparent using
traditional methods. This can help us improve our understanding of explosive
events like solar flares, which can have a strong effect on the Earth
environment. Predicting hazardous events on Earth becomes crucial for our
technological society. Machine learning can also improve our understanding of
the inner workings of the sun itself by allowing us to go deeper into the data
and to propose more complex models to explain them. Additionally, the use of
machine learning can help to automate the analysis of solar data, reducing the
need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a
Living Review in Solar Physics (LRSP
Defending Black-box Classifiers by Bayesian Boundary Correction
Classifiers based on deep neural networks have been recently challenged by
Adversarial Attack, where the widely existing vulnerability has invoked the
research in defending them from potential threats. Given a vulnerable
classifier, existing defense methods are mostly white-box and often require
re-training the victim under modified loss functions/training regimes. While
the model/data/training specifics of the victim are usually unavailable to the
user, re-training is unappealing, if not impossible for reasons such as limited
computational resources. To this end, we propose a new black-box defense
framework. It can turn any pre-trained classifier into a resilient one with
little knowledge of the model specifics. This is achieved by new joint Bayesian
treatments on the clean data, the adversarial examples and the classifier, for
maximizing their joint probability. It is further equipped with a new
post-train strategy which keeps the victim intact. We name our framework
Bayesian Boundary Correction (BBC). BBC is a general and flexible framework
that can easily adapt to different data types. We instantiate BBC for image
classification and skeleton-based human activity recognition, for both static
and dynamic data. Exhaustive evaluation shows that BBC has superior robustness
and can enhance robustness without severely hurting the clean accuracy,
compared with existing defense methods.Comment: arXiv admin note: text overlap with arXiv:2203.0471
Modeling Group Dynamics for Personalized Robot-Mediated Interactions
The field of human-human-robot interaction (HHRI) uses social robots to
positively influence how humans interact with each other. This objective
requires models of human understanding that consider multiple humans in an
interaction as a collective entity and represent the group dynamics that exist
within it. Understanding group dynamics is important because these can
influence the behaviors, attitudes, and opinions of each individual within the
group, as well as the group as a whole. Such an understanding is also useful
when personalizing an interaction between a robot and the humans in its
environment, where a group-level model can facilitate the design of robot
behaviors that are tailored to a given group, the dynamics that exist within
it, and the specific needs and preferences of the individual interactants. In
this paper, we highlight the need for group-level models of human understanding
in human-human-robot interaction research and how these can be useful in
developing personalization techniques. We survey existing models of group
dynamics and categorize them into models of social dominance, affect, social
cohesion, and conflict resolution. We highlight the important features these
models utilize, evaluate their potential to capture interpersonal aspects of a
social interaction, and highlight their value for personalization techniques.
Finally, we identify directions for future work, and make a case for models of
relational affect as an approach that can better capture group-level
understanding of human-human interactions and be useful in personalizing
human-human-robot interactions
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Large vision-language models have achieved outstanding performance, but their
size and computational requirements make their deployment on
resource-constrained devices and time-sensitive tasks impractical. Model
distillation, the process of creating smaller, faster models that maintain the
performance of larger models, is a promising direction towards the solution.
This paper investigates the distillation of visual representations in large
teacher vision-language models into lightweight student models using a small-
or mid-scale dataset. Notably, this study focuses on open-vocabulary
out-of-distribution (OOD) generalization, a challenging problem that has been
overlooked in previous model distillation literature. We propose two principles
from vision and language modality perspectives to enhance student's OOD
generalization: (1) by better imitating teacher's visual representation space,
and carefully promoting better coherence in vision-language alignment with the
teacher; (2) by enriching the teacher's language representations with
informative and finegrained semantic attributes to effectively distinguish
between different labels. We propose several metrics and conduct extensive
experiments to investigate their techniques. The results demonstrate
significant improvements in zero-shot and few-shot student performance on
open-vocabulary out-of-distribution classification, highlighting the
effectiveness of our proposed approaches. Our code will be released at
https://github.com/xuanlinli17/large_vlm_distillation_oo
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Weakly-Supervised Semantic Segmentation of Airborne LiDAR Point Clouds in Hong Kong Urban Areas
Semantic segmentation of airborne LiDAR point clouds of urban areas is an essential process prior to applying LiDAR data to further applications such as 3D city modeling. Large-scale point cloud semantic segmentation is challenging in practical applications due to the massive data size and time-consuming point-wise annotation. This paper applied weakly-supervised Semantic Query Network and sparse points annotation pipeline to practical airborne LiDAR datasets for urban scene semantic segmentation in Hong Kong. The experiment result obtained the overall accuracy over 84% and the mean intersect over union over 75%. The capacity of intensity and return attributes of LiDAR data to classify the vegetation and construction was explored and discussed. This work demonstrates an efficient workflow of large-scale airborne LiDAR point cloud semantic segmentation in practice
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations
The local explanation provides heatmaps on images to explain how
Convolutional Neural Networks (CNNs) derive their output. Due to its visual
straightforwardness, the method has been one of the most popular explainable AI
(XAI) methods for diagnosing CNNs. Through our formative study (S1), however,
we captured ML engineers' ambivalent perspective about the local explanation as
a valuable and indispensable envision in building CNNs versus the process that
exhausts them due to the heuristic nature of detecting vulnerability. Moreover,
steering the CNNs based on the vulnerability learned from the diagnosis seemed
highly challenging. To mitigate the gap, we designed DeepFuse, the first
interactive design that realizes the direct feedback loop between a user and
CNNs in diagnosing and revising CNN's vulnerability using local explanations.
DeepFuse helps CNN engineers to systemically search "unreasonable" local
explanations and annotate the new boundaries for those identified as
unreasonable in a labor-efficient manner. Next, it steers the model based on
the given annotation such that the model doesn't introduce similar mistakes. We
conducted a two-day study (S2) with 12 experienced CNN engineers. Using
DeepFuse, participants made a more accurate and "reasonable" model than the
current state-of-the-art. Also, participants found the way DeepFuse guides
case-based reasoning can practically improve their current practice. We provide
implications for design that explain how future HCI-driven design can move our
practice forward to make XAI-driven insights more actionable.Comment: 32 pages, 6 figures, 5 tables. Accepted for publication in the
Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 202
- …