1,410 research outputs found
Gated networks: an inventory
Gated networks are networks that contain gating connections, in which the
outputs of at least two neurons are multiplied. Initially, gated networks were
used to learn relationships between two input sources, such as pixels from two
images. More recently, they have been applied to learning activity recognition
or multi-modal representations. The aims of this paper are threefold: 1) to
explain the basic computations in gated networks to the non-expert, while
adopting a standpoint that insists on their symmetric nature. 2) to serve as a
quick reference guide to the recent literature, by providing an inventory of
applications of these networks, as well as recent extensions to the basic
architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page
Linking Image and Text with 2-Way Nets
Linking two data sources is a basic building block in numerous computer
vision problems. Canonical Correlation Analysis (CCA) achieves this by
utilizing a linear optimizer in order to maximize the correlation between the
two views. Recent work makes use of non-linear models, including deep learning
techniques, that optimize the CCA loss in some feature space. In this paper, we
introduce a novel, bi-directional neural network architecture for the task of
matching vectors from two data sources. Our approach employs two tied neural
network channels that project the two views into a common, maximally correlated
space using the Euclidean loss. We show a direct link between the
correlation-based loss and Euclidean loss, enabling the use of Euclidean loss
for correlation maximization. To overcome common Euclidean regression
optimization problems, we modify well-known techniques to our problem,
including batch normalization and dropout. We show state of the art results on
a number of computer vision matching tasks including MNIST image matching and
sentence-image matching on the Flickr8k, Flickr30k and COCO datasets.Comment: 14 pages, 2 figures, 6 table
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning
Federated learning (FL) enables a decentralized machine learning paradigm for
multiple clients to collaboratively train a generalized global model without
sharing their private data. Most existing works simply propose typical FL
systems for single-modal data, thus limiting its potential on exploiting
valuable multimodal data for future personalized applications. Furthermore, the
majority of FL approaches still rely on the labeled data at the client side,
which is limited in real-world applications due to the inability of
self-annotation from users. In light of these limitations, we propose a novel
multimodal FL framework that employs a semi-supervised learning approach to
leverage the representations from different modalities. Bringing this concept
into a system, we develop a distillation-based multimodal embedding knowledge
transfer mechanism, namely FedMEKT, which allows the server and clients to
exchange the joint knowledge of their learning models extracted from a small
multimodal proxy dataset. Our FedMEKT iteratively updates the generalized
global encoders with the joint embedding knowledge from the participating
clients. Thereby, to address the modality discrepancy and labeled data
constraint in existing FL systems, our proposed FedMEKT comprises local
multimodal autoencoder learning, generalized multimodal autoencoder
construction, and generalized classifier learning. Through extensive
experiments on three multimodal human activity recognition datasets, we
demonstrate that FedMEKT achieves superior global encoder performance on linear
evaluation and guarantees user privacy for personal data and model parameters
while demanding less communication cost than other baselines
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
Traditional architectures for solving computer vision problems and the degree
of success they enjoyed have been heavily reliant on hand-crafted features.
However, of late, deep learning techniques have offered a compelling
alternative -- that of automatically learning problem-specific features. With
this new paradigm, every problem in computer vision is now being re-examined
from a deep learning perspective. Therefore, it has become important to
understand what kind of deep networks are suitable for a given problem.
Although general surveys of this fast-moving paradigm (i.e. deep-networks)
exist, a survey specific to computer vision is missing. We specifically
consider one form of deep networks widely used in computer vision -
convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN
and then examine the broad variations proposed over time to suit different
applications. We hope that our recipe-style survey will serve as a guide,
particularly for novice practitioners intending to use deep-learning techniques
for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm
- …