15,492 research outputs found
Hierarchical Label Inference for Video Classification
Videos are a rich source of high-dimensional structured data, with a wide
range of interacting components at varying levels of granularity. In order to
improve understanding of unconstrained internet videos, it is important to
consider the role of labels at separate levels of abstraction. In this paper,
we consider the use of the Bidirectional Inference Neural Network (BINN) for
performing graph-based inference in label space for the task of video
classification. We take advantage of the inherent hierarchy between labels at
increasing granularity. The BINN is evaluated on the first and second release
of the YouTube-8M large scale multilabel video dataset. Our results demonstrate
the effectiveness of BINN, achieving significant improvements against baseline
models
Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons
Deep feed-forward convolutional neural networks (CNNs) have become ubiquitous
in virtually all machine learning and computer vision challenges; however,
advancements in CNNs have arguably reached an engineering saturation point
where incremental novelty results in minor performance gains. Although there is
evidence that object classification has reached human levels on narrowly
defined tasks, for general applications, the biological visual system is far
superior to that of any computer. Research reveals there are numerous missing
components in feed-forward deep neural networks that are critical in mammalian
vision. The brain does not work solely in a feed-forward fashion, but rather
all of the neurons are in competition with each other; neurons are integrating
information in a bottom up and top down fashion and incorporating expectation
and feedback in the modeling process. Furthermore, our visual cortex is working
in tandem with our parietal lobe, integrating sensory information from various
modalities.
In our work, we sought to improve upon the standard feed-forward deep
learning model by augmenting them with biologically inspired concepts of
sparsity, top-down feedback, and lateral inhibition. We define our model as a
sparse coding problem using hierarchical layers. We solve the sparse coding
problem with an additional top-down feedback error driving the dynamics of the
neural network. While building and observing the behavior of our model, we were
fascinated that multimodal, invariant neurons naturally emerged that mimicked,
"Halle Berry neurons" found in the human brain. Furthermore, our sparse
representation of multimodal signals demonstrates qualitative and quantitative
superiority to the standard feed-forward joint embedding in common vision and
machine learning tasks
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Structured Label Inference for Visual Understanding
Visual data such as images and videos contain a rich source of structured
semantic labels as well as a wide range of interacting components. Visual
content could be assigned with fine-grained labels describing major components,
coarse-grained labels depicting high level abstractions, or a set of labels
revealing attributes. Such categorization over different, interacting layers of
labels evinces the potential for a graph-based encoding of label information.
In this paper, we exploit this rich structure for performing graph-based
inference in label space for a number of tasks: multi-label image and video
classification and action detection in untrimmed videos. We consider the use of
the Bidirectional Inference Neural Network (BINN) and Structured Inference
Neural Network (SINN) for performing graph-based inference in label space and
propose a Long Short-Term Memory (LSTM) based extension for exploiting activity
progression on untrimmed videos. The methods were evaluated on (i) the Animal
with Attributes (AwA), Scene Understanding (SUN) and NUS-WIDE datasets for
multi-label image classification, (ii) the first two releases of the YouTube-8M
large scale dataset for multi-label video classification, and (iii) the
THUMOS'14 and MultiTHUMOS video datasets for action detection. Our results
demonstrate the effectiveness of structured label inference in these
challenging tasks, achieving significant improvements against baselines
Application of Deep Learning on Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations
We explore how Deep Learning (DL) can be utilized to predict prognosis of
acute myeloid leukemia (AML). Out of TCGA (The Cancer Genome Atlas) database,
94 AML cases are used in this study. Input data include age, 10 common
cytogenetic and 23 most common mutation results; output is the prognosis
(diagnosis to death, DTD). In our DL network, autoencoders are stacked to form
a hierarchical DL model from which raw data are compressed and organized and
high-level features are extracted. The network is written in R language and is
designed to predict prognosis of AML for a given case (DTD of more than or less
than 730 days). The DL network achieves an excellent accuracy of 83% in
predicting prognosis. As a proof-of-concept study, our preliminary results
demonstrate a practical application of DL in future practice of prognostic
prediction using next-gen sequencing (NGS) data.Comment: 11 pages, 1 table, 1 figure. arXiv admin note: substantial text
overlap with arXiv:1801.0101
Maximum a Posteriori Adaptation of Network Parameters in Deep Models
We present a Bayesian approach to adapting parameters of a well-trained
context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to
improve automatic speech recognition performance. Given an abundance of DNN
parameters but with only a limited amount of data, the effectiveness of the
adapted DNN model can often be compromised. We formulate maximum a posteriori
(MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an
augmented linear hidden networks connected to the output tied states, or
senones, and compare it to feature space MAP linear regression previously
proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street
Journal task demonstrate the feasibility of the proposed framework. In
supervised adaptation, the proposed MAP adaptation approach provides more than
10% relative error reduction and consistently outperforms the conventional
transformation based methods. Furthermore, we present an initial attempt to
generate hierarchical priors to improve adaptation efficiency and effectiveness
with limited adaptation data by exploiting similarities among senones
Towards WARSHIP: Combining Components of Brain-Inspired Computing of RSH for Image Super Resolution
Evolution of deep learning shows that some algorithmic tricks are more
durable , while others are not. To the best of our knowledge, we firstly
summarize 5 more durable and complete deep learning components for vision, that
is, WARSHIP. Moreover, we give a biological overview of WARSHIP, emphasizing
brain-inspired computing of WARSHIP. As a step towards WARSHIP, our case study
of image super resolution combines 3 components of RSH to deploy a CNN model of
WARSHIP-XZNet, which performs a happy medium between speed and performance.Comment: 2018 5th IEEE International Conference on Cloud Computing and
Intelligence System
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features
This work presents a novel method of exploring human brain-visual
representations, with a view towards replicating these processes in machines.
The core idea is to learn plausible computational and biological
representations by correlating human neural activity and natural images. Thus,
we first propose a model, EEG-ChannelNet, to learn a brain manifold for EEG
classification. After verifying that visual information can be extracted from
EEG data, we introduce a multimodal approach that uses deep image and EEG
encoders, trained in a siamese configuration, for learning a joint manifold
that maximizes a compatibility measure between visual features and brain
representations. We then carry out image classification and saliency detection
on the learned manifold. Performance analyses show that our approach
satisfactorily decodes visual information from neural signals. This, in turn,
can be used to effectively supervise the training of deep learning models, as
demonstrated by the high performance of image classification and saliency
detection on out-of-training classes. The obtained results show that the
learned brain-visual features lead to improved performance and simultaneously
bring deep models more in line with cognitive neuroscience work related to
visual perception and attention
Music Generation by Deep Learning - Challenges and Directions
In addition to traditional tasks such as prediction, classification and
translation, deep learning is receiving growing attention as an approach for
music generation, as witnessed by recent research groups such as Magenta at
Google and CTRL (Creator Technology Research Lab) at Spotify. The motivation is
in using the capacity of deep learning architectures and training techniques to
automatically learn musical styles from arbitrary musical corpora and then to
generate samples from the estimated distribution. However, a direct application
of deep learning to generate content rapidly reaches limits as the generated
content tends to mimic the training set without exhibiting true creativity.
Moreover, deep learning architectures do not offer direct ways for controlling
generation (e.g., imposing some tonality or other arbitrary constraints).
Furthermore, deep learning architectures alone are autistic automata which
generate music autonomously without human user interaction, far from the
objective of interactively assisting musicians to compose and refine music.
Issues such as: control, structure, creativity and interactivity are the focus
of our analysis. In this paper, we select some limitations of a direct
application of deep learning to music generation, analyze why the issues are
not fulfilled and how to address them by possible approaches. Various examples
of recent systems are cited as examples of promising directions.Comment: 17 pages. arXiv admin note: substantial text overlap with
arXiv:1709.01620. Accepted for publication in Special Issue on Deep learning
for music and audio, Neural Computing & Applications, Springer Nature, 201
Capacity allocation analysis of neural networks: A tool for principled architecture design
Designing neural network architectures is a task that lies somewhere between
science and art. For a given task, some architectures are eventually preferred
over others, based on a mix of intuition, experience, experimentation and luck.
For many tasks, the final word is attributed to the loss function, while for
some others a further perceptual evaluation is necessary to assess and compare
performance across models. In this paper, we introduce the concept of capacity
allocation analysis, with the aim of shedding some light on what network
architectures focus their modelling capacity on, when used on a given task. We
focus more particularly on spatial capacity allocation, which analyzes a
posteriori the effective number of parameters that a given model has allocated
for modelling dependencies on a given point or region in the input space, in
linear settings. We use this framework to perform a quantitative comparison
between some classical architectures on various synthetic tasks. Finally, we
consider how capacity allocation might translate in non-linear settings.Comment: 25 pages, 15 figure
- …