264 research outputs found
Describing Images by Semantic Modeling using Attributes and Tags
This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error
A category of kernels for equivariant factorizations, II: further implications
We leverage the results of the prequel in combination with a theorem of D.
Orlov to yield some results in Hodge theory of derived categories of
factorizations and derived categories of coherent sheaves on varieties. In
particular, we provide a conjectural geometric framework to further understand
M. Kontsevich's Homological Mirror Symmetry conjecture. We obtain new cases of
a conjecture of Orlov concerning the Rouquier dimension of the bounded derived
category of coherent sheaves on a smooth variety. Further, we introduce actions
of -graded commutative rings on triangulated categories and their associated
Noether-Lefschetz spectra as a new invariant of triangulated categories. They
are intended to encode information about algebraic classes in the cohomology of
an algebraic variety. We provide some examples to motivate the connection.Comment: v2: Updated references and addresses. Cleaved off a part. 54 pages.
v1: Expanded version of the latter half of arXiv:1105.3177. 92 pages.
Comments very welcome
Implementing Bayesian Inference with Neural Networks
Embodied agents, be they animals or robots, acquire information about the world through their senses. Embodied agents, however, do not simply lose this information once it passes by, but rather process and store it for future use. The most general theory of how an agent can combine stored knowledge with new observations is Bayesian inference. In this dissertation I present a theory of how embodied agents can learn to implement Bayesian inference with neural networks.
By neural network I mean both artificial and biological neural networks, and in my dissertation I address both kinds. On one hand, I develop theory for implementing Bayesian inference in deep generative models, and I show how to train multilayer perceptrons to compute approximate predictions for Bayesian filtering. On the other hand, I show that several models in computational neuroscience are special cases of the general theory that I develop in this dissertation, and I use this theory to model and explain several phenomena in neuroscience. The key contributions of this dissertation can be summarized as follows:
- I develop a class of graphical model called nth-order harmoniums. An nth-order harmonium is an n-tuple of random variables, where the conditional distribution of each variable given all the others is always an element of the same exponential family. I show that harmoniums have a recursive structure which allows them to be analyzed at coarser and finer levels of detail.
- I define a class of harmoniums called rectified harmoniums, which are constrained to have priors which are conjugate to their posteriors. As a consequence of this, rectified harmoniums afford efficient sampling and learning.
- I develop deep harmoniums, which are harmoniums which can be represented by hierarchical, undirected graphs. I develop the theory of rectification for deep harmoniums, and develop a novel algorithm for training deep generative models.
- I show how to implement a variety of optimal and near-optimal Bayes filters by combining the solution to Bayes' rule provided by rectified harmoniums, with predictions computed by a recurrent neural network. I then show how to train a neural network to implement Bayesian filtering when the transition and emission distributions are unknown.
- I show how some well-established models of neural activity are special cases of the theory I present in this dissertation, and how these models can be generalized with the theory of rectification.
- I show how the theory that I present can model several neural phenomena including proprioception and gain-field modulation of tuning curves.
- I introduce a library for the programming language Haskell, within which I have implemented all the simulations presented in this dissertation. This library uses concepts from Riemannian geometry to provide a rigorous and efficient environment for implementing complex numerical simulations.
I also use the results presented in this dissertation to argue for the fundamental role of neural computation in embodied cognition. I argue, in other words, that before we will be able to build truly intelligent robots, we will need to truly understand biological brains
Exploring Algorithmic Limits of Matrix Rank Minimization under Affine Constraints
Many applications require recovering a matrix of minimal rank within an
affine constraint set, with matrix completion a notable special case. Because
the problem is NP-hard in general, it is common to replace the matrix rank with
the nuclear norm, which acts as a convenient convex surrogate. While elegant
theoretical conditions elucidate when this replacement is likely to be
successful, they are highly restrictive and convex algorithms fail when the
ambient rank is too high or when the constraint set is poorly structured.
Non-convex alternatives fare somewhat better when carefully tuned; however,
convergence to locally optimal solutions remains a continuing source of
failure. Against this backdrop we derive a deceptively simple and
parameter-free probabilistic PCA-like algorithm that is capable, over a wide
battery of empirical tests, of successful recovery even at the theoretical
limit where the number of measurements equal the degrees of freedom in the
unknown low-rank matrix. Somewhat surprisingly, this is possible even when the
affine constraint set is highly ill-conditioned. While proving general recovery
guarantees remains evasive for non-convex algorithms, Bayesian-inspired or
otherwise, we nonetheless show conditions whereby the underlying cost function
has a unique stationary point located at the global optimum; no existing cost
function we are aware of satisfies this same property. We conclude with a
simple computer vision application involving image rectification and a standard
collaborative filtering benchmark
Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction
Click-through rate (CTR) prediction is a critical task in online advertising
systems. A large body of research considers each ad independently, but ignores
its relationship to other ads that may impact the CTR. In this paper, we
investigate various types of auxiliary ads for improving the CTR prediction of
the target ad. In particular, we explore auxiliary ads from two viewpoints: one
is from the spatial domain, where we consider the contextual ads shown above
the target ad on the same page; the other is from the temporal domain, where we
consider historically clicked and unclicked ads of the user. The intuitions are
that ads shown together may influence each other, clicked ads reflect a user's
preferences, and unclicked ads may indicate what a user dislikes to certain
extent. In order to effectively utilize these auxiliary data, we propose the
Deep Spatio-Temporal neural Networks (DSTNs) for CTR prediction. Our model is
able to learn the interactions between each type of auxiliary data and the
target ad, to emphasize more important hidden information, and to fuse
heterogeneous data in a unified framework. Offline experiments on one public
dataset and two industrial datasets show that DSTNs outperform several
state-of-the-art methods for CTR prediction. We have deployed the
best-performing DSTN in Shenma Search, which is the second largest search
engine in China. The A/B test results show that the online CTR is also
significantly improved compared to our last serving model.Comment: Accepted by KDD 201
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation
Historical interactions are the default choice for recommender model
training, which typically exhibit high sparsity, i.e., most user-item pairs are
unobserved missing data. A standard choice is treating the missing data as
negative training samples and estimating interaction likelihood between
user-item pairs along with the observed interactions. In this way, some
potential interactions are inevitably mislabeled during training, which will
hurt the model fidelity, hindering the model to recall the mislabeled items,
especially the long-tail ones. In this work, we investigate the mislabeling
issue from a new perspective of aleatoric uncertainty, which describes the
inherent randomness of missing data. The randomness pushes us to go beyond
merely the interaction likelihood and embrace aleatoric uncertainty modeling.
Towards this end, we propose a new Aleatoric Uncertainty-aware Recommendation
(AUR) framework that consists of a new uncertainty estimator along with a
normal recommender model. According to the theory of aleatoric uncertainty, we
derive a new recommendation objective to learn the estimator. As the chance of
mislabeling reflects the potential of a pair, AUR makes recommendations
according to the uncertainty, which is demonstrated to improve the
recommendation performance of less popular items without sacrificing the
overall performance. We instantiate AUR on three representative recommender
models: Matrix Factorization (MF), LightGCN, and VAE from mainstream model
architectures. Extensive results on two real-world datasets validate the
effectiveness of AUR w.r.t. better recommendation results, especially on
long-tail items
Adaptive whitening with fast gain modulation and slow synaptic plasticity
Neurons in early sensory areas rapidly adapt to changing sensory statistics,
both by normalizing the variance of their individual responses and by reducing
correlations between their responses. Together, these transformations may be
viewed as an adaptive form of statistical whitening. Existing mechanistic
models of adaptive whitening exclusively use either synaptic plasticity or gain
modulation as the biological substrate for adaptation; however, on their own,
each of these models has significant limitations. In this work, we unify these
approaches in a normative multi-timescale mechanistic model that adaptively
whitens its responses with complementary computational roles for synaptic
plasticity and gain modulation. Gains are modified on a fast timescale to adapt
to the current statistical context, whereas synapses are modified on a slow
timescale to match structural properties of the input statistics that are
invariant across contexts. Our model is derived from a novel multi-timescale
whitening objective that factorizes the inverse whitening matrix into basis
vectors, which correspond to synaptic weights, and a diagonal matrix, which
corresponds to neuronal gains. We test our model on synthetic and natural
datasets and find that the synapses learn optimal configurations over long
timescales that enable adaptive whitening on short timescales using gain
modulation.Comment: NeurIPS 2023 Spotlight; 18 pages, 8 figure
- …