2,272 research outputs found
Implicit Training of Energy Model for Structure Prediction
Most deep learning research has focused on developing new model and training
procedures. On the other hand the training objective has usually been
restricted to combinations of standard losses. When the objective aligns well
with the evaluation metric, this is not a major issue. However when dealing
with complex structured outputs, the ideal objective can be hard to optimize
and the efficacy of usual objectives as a proxy for the true objective can be
questionable. In this work, we argue that the existing inference network based
structure prediction methods ( Tu and Gimpel 2018; Tu, Pang, and Gimpel 2020)
are indirectly learning to optimize a dynamic loss objective parameterized by
the energy model. We then explore using implicit-gradient based technique to
learn the corresponding dynamic objectives. Our experiments show that
implicitly learning a dynamic loss landscape is an effective method for
improving model performance in structure prediction.Comment: AAA
Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
Existing methods, such as concept bottleneck models (CBMs), have been
successful in providing concept-based interpretations for black-box deep
learning models. They typically work by predicting concepts given the input and
then predicting the final class label given the predicted concepts. However,
(1) they often fail to capture the high-order, nonlinear interaction between
concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not
help correct highly correlated concepts (e.g., "yellow belly"), leading to
suboptimal final accuracy; (2) they cannot naturally quantify the complex
conditional dependencies between different concepts and class labels (e.g., for
an image with the class label "Kentucky Warbler" and a concept "black bill",
what is the probability that the model correctly predicts another concept
"black crown"), therefore failing to provide deeper insight into how a
black-box model works. In response to these limitations, we propose
Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural
networks to define the joint energy of candidate (input, concept, class)
tuples. With such a unified interface, prediction, concept correction, and
conditional dependency quantification are then represented as conditional
probabilities, which are generated by composing different energy functions. Our
ECBMs address both limitations of existing CBMs, providing higher accuracy and
richer concept interpretations. Empirical results show that our approach
outperforms the state-of-the-art on real-world datasets.Comment: Accepted by ICLR 202
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Hybrid system identification using switching density networks
Behaviour cloning is a commonly used strategy for imitation learning and can
be extremely effective in constrained domains. However, in cases where the
dynamics of an environment may be state dependent and varying, behaviour
cloning places a burden on model capacity and the number of demonstrations
required. This paper introduces switching density networks, which rely on a
categorical reparametrisation for hybrid system identification. This results in
a network comprising a classification layer that is followed by a regression
layer. We use switching density networks to predict the parameters of hybrid
control laws, which are toggled by a switching layer to produce different
controller outputs, when conditioned on an input state. This work shows how
switching density networks can be used for hybrid system identification in a
variety of tasks, successfully identifying the key joint angle goals that make
up manipulation tasks, while simultaneously learning image-based goal
classifiers and regression networks that predict joint angles from images. We
also show that they can cluster the phase space of an inverted pendulum,
identifying the balance, spin and pump controllers required to solve this task.
Switching density networks can be difficult to train, but we introduce a cross
entropy regularisation loss that stabilises training
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Multimodal Human Group Behavior Analysis
Human behaviors in a group setting involve a complex mixture of multiple modalities: audio, visual, linguistic, and human interactions. With the rapid progress of AI, automatic prediction and understanding of these behaviors is no longer a dream. In a negotiation, discovering human relationships and identifying the dominant person can be useful for decision making. In security settings, detecting nervous behaviors can help law enforcement agents spot suspicious people. In adversarial settings such as national elections and court defense, identifying persuasive speakers is a critical task. It is beneficial to build accurate machine learning (ML) models to predict such human group behaviors. There are two elements for successful prediction of group behaviors. The first is to design domain-specific features for each modality. Social and Psychological studies have uncovered various factors including both individual cues and group interactions, which inspire us to extract relevant features computationally. In particular, the group interaction modality plays an important role, since human behaviors influence each other through interactions in a group. Second, effective multimodal ML models are needed to align and integrate the different modalities for accurate predictions. However, most previous work ignored the group interaction modality. Moreover, they only adopt early fusion or late fusion to combine different modalities, which is not optimal. This thesis presents methods to train models taking multimodal inputs in group interaction videos, and to predict human group behaviors. First, we develop an ML algorithm to automatically predict human interactions from videos, which is the basis to extract interaction features and model group behaviors. Second, we propose a multimodal method to identify dominant people in videos from multiple modalities. Third, we study the nervousness in human behavior by a developing hybrid method: group interaction feature engineering combined with individual facial embedding learning. Last, we introduce a multimodal fusion framework that enables us to predict how persuasive speakers are.
Overall, we develop one algorithm to extract group interactions and build three multimodal models to identify three kinds of human behavior in videos: dominance, nervousness and persuasion. The experiments demonstrate the efficacy of the methods and analyze the modality-wise contributions
- …