1,540 research outputs found
Towards Building Deep Networks with Bayesian Factor Graphs
We propose a Multi-Layer Network based on the Bayesian framework of the
Factor Graphs in Reduced Normal Form (FGrn) applied to a two-dimensional
lattice. The Latent Variable Model (LVM) is the basic building block of a
quadtree hierarchy built on top of a bottom layer of random variables that
represent pixels of an image, a feature map, or more generally a collection of
spatially distributed discrete variables. The multi-layer architecture
implements a hierarchical data representation that, via belief propagation, can
be used for learning and inference. Typical uses are pattern completion,
correction and classification. The FGrn paradigm provides great flexibility and
modularity and appears as a promising candidate for building deep networks: the
system can be easily extended by introducing new and different (in cardinality
and in type) variables. Prior knowledge, or supervised information, can be
introduced at different scales. The FGrn paradigm provides a handy way for
building all kinds of architectures by interconnecting only three types of
units: Single Input Single Output (SISO) blocks, Sources and Replicators. The
network is designed like a circuit diagram and the belief messages flow
bidirectionally in the whole system. The learning algorithms operate only
locally within each block. The framework is demonstrated in this paper in a
three-layer structure applied to images extracted from a standard data set.Comment: Submitted for journal publicatio
Associative Embedding for Game-Agnostic Team Discrimination
Assigning team labels to players in a sport game is not a trivial task when
no prior is known about the visual appearance of each team. Our work builds on
a Convolutional Neural Network (CNN) to learn a descriptor, namely a pixel-wise
embedding vector, that is similar for pixels depicting players from the same
team, and dissimilar when pixels correspond to distinct teams. The advantage of
this idea is that no per-game learning is needed, allowing efficient team
discrimination as soon as the game starts. In principle, the approach follows
the associative embedding framework introduced in arXiv:1611.05424 to
differentiate instances of objects. Our work is however different in that it
derives the embeddings from a lightweight segmentation network and, more
fundamentally, because it considers the assignment of the same embedding to
unconnected pixels, as required by pixels of distinct players from the same
team. Excellent results, both in terms of team labelling accuracy and
generalization to new games/arenas, have been achieved on panoramic views of a
large variety of basketball games involving players interactions and
occlusions. This makes our method a good candidate to integrate team separation
in many CNN-based sport analytics pipelines.Comment: Published in CVPR 2019 workshop Computer Vision in Sports, under the
name "Associative Embedding for Team Discrimination"
(http://openaccess.thecvf.com/content_CVPRW_2019/html/CVSports/Istasse_Associative_Embedding_for_Team_Discrimination_CVPRW_2019_paper.html
Energy Transformer
Transformers have become the de facto models of choice in machine learning,
typically leading to impressive performance on many applications. At the same
time, the architectural development in the transformer world is mostly driven
by empirical findings, and the theoretical understanding of their architectural
building blocks is rather limited. In contrast, Dense Associative Memory models
or Modern Hopfield Networks have a well-established theoretical foundation, but
have not yet demonstrated truly impressive practical results. We propose a
transformer architecture that replaces the sequence of feedforward transformer
blocks with a single large Associative Memory model. Our novel architecture,
called Energy Transformer (or ET for short), has many of the familiar
architectural primitives that are often used in the current generation of
transformers. However, it is not identical to the existing architectures. The
sequence of transformer layers in ET is purposely designed to minimize a
specifically engineered energy function, which is responsible for representing
the relationships between the tokens. As a consequence of this computational
principle, the attention in ET is different from the conventional attention
mechanism. In this work, we introduce the theoretical foundations of ET,
explore it's empirical capabilities using the image completion task, and obtain
strong quantitative results on the graph anomaly detection task
- …