291 research outputs found
Tensor Contraction Layers for Parsimonious Deep Nets
Tensors offer a natural representation for many kinds of data frequently
encountered in machine learning. Images, for example, are naturally represented
as third order tensors, where the modes correspond to height, width, and
channels. Tensor methods are noted for their ability to discover
multi-dimensional dependencies, and tensor decompositions in particular, have
been used to produce compact low-rank approximations of data. In this paper, we
explore the use of tensor contractions as neural network layers and investigate
several ways to apply them to activation tensors. Specifically, we propose the
Tensor Contraction Layer (TCL), the first attempt to incorporate tensor
contractions as end-to-end trainable neural network layers. Applied to existing
networks, TCLs reduce the dimensionality of the activation tensors and thus the
number of model parameters. We evaluate the TCL on the task of image
recognition, augmenting two popular networks (AlexNet, VGG). The resulting
models are trainable end-to-end. Applying the TCL to the task of image
recognition, using the CIFAR100 and ImageNet datasets, we evaluate the effect
of parameter reduction via tensor contraction on performance. We demonstrate
significant model compression without significant impact on the accuracy and,
in some cases, improved performance
Tensor Regression Networks
Convolutional neural networks typically consist of many convolutional layers
followed by one or more fully connected layers. While convolutional layers map
between high-order activation tensors, the fully connected layers operate on
flattened activation vectors. Despite empirical success, this approach has
notable drawbacks. Flattening followed by fully connected layers discards
multilinear structure in the activations and requires many parameters. We
address these problems by incorporating tensor algebraic operations that
preserve multilinear structure at every layer. First, we introduce Tensor
Contraction Layers (TCLs) that reduce the dimensionality of their input while
preserving their multilinear structure using tensor contraction. Next, we
introduce Tensor Regression Layers (TRLs), which express outputs through a
low-rank multilinear mapping from a high-order activation tensor to an output
tensor of arbitrary order. We learn the contraction and regression factors
end-to-end, and produce accurate nets with fewer parameters. Additionally, our
layers regularize networks by imposing low-rank constraints on the activations
(TCL) and regression weights (TRL). Experiments on ImageNet show that, applied
to VGG and ResNet architectures, TCLs and TRLs reduce the number of parameters
compared to fully connected layers by more than 65% while maintaining or
increasing accuracy. In addition to the space savings, our approach's ability
to leverage topological structure can be crucial for structured data such as
MRI. In particular, we demonstrate significant performance improvements over
comparable architectures on three tasks associated with the UK Biobank dataset
Born Again Neural Networks
Knowledge distillation (KD) consists of transferring knowledge from one
machine learning model (the teacher}) to another (the student). Commonly, the
teacher is a high-capacity model with formidable performance, while the student
is more compact. By transferring knowledge, one hopes to benefit from the
student's compactness. %we desire a compact model with performance close to the
teacher's. We study KD from a new perspective: rather than compressing models,
we train students parameterized identically to their teachers. Surprisingly,
these {Born-Again Networks (BANs), outperform their teachers significantly,
both on computer vision and language modeling tasks. Our experiments with BANs
based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10
(3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional
experiments explore two distillation objectives: (i) Confidence-Weighted by
Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP).
Both methods elucidate the essential components of KD, demonstrating a role of
the teacher outputs on both predicted and non-predicted classes. We present
experiments with students of various capacities, focusing on the under-explored
case where students overpower teachers. Our experiments show significant
advantages from transferring knowledge between DenseNets and ResNets in either
direction.Comment: Published @ICML 201
Learning Causal State Representations of Partially Observable Environments
Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs
Canine parvovirus (CPV) phylogeny is associated with disease severity
After its first identification in 1978, canine parvovirus (CPV) has been recognized all around the world as a major threat for canine population health. This ssDNA virus is characterized by a high substitution rate and several genetic and phenotypic variants emerged over time. Overall, the definition of 3 main antigenic variants was established based on specific amino acid markers located in a precise capsid position. However, the detection of several minor variants and incongruence observed between the antigenic classification and phylogeny have posed doubts on the reliability of this scheme. At the same time, CPV heterogeneity has favored the hypothesis of a differential virulence among variants, although no robust and consistent demonstration has been provided yet. The present study rejects the antigenic variant concept and attempts to evaluate the association between CPV strain phylogeny, reconstructed using the whole information contained in the VP2 coding gene, and several clinical and hemato-biochemical parameters, assessed from 34 CPV infected dogs at admission. By using different statistical approaches, the results of the present study show an association between viral phylogeny and host parameters ascribable to immune system, coagulation profile, acute phase response and, more generally, to the overall picture of the animal response. Particularly, a strong and significant phylogenetic signal was proven for neutrophil count and WBC. Therefore, despite the limited sample size, a relation between viral phylogeny and disease severity has been observed for the first time, suggesting that CPV virulence is an inherited trait. The likely existence of clades with different virulence highlights once more the relevance of intensive epidemiological monitoring and research on CPV evolution to better understand the virulence determinants, their epidemiology and develop adequate countermeasures
Supervised classification of combined copy number and gene expression data
Summary In this paper we apply a predictive profiling method to genome copy number aberrations (CNA) in combination with gene expression and clinical data to identify molecular patterns of cancer pathophysiology. Predictive models and optimal feature lists for the platforms are developed by a complete validation SVM-based machine learning system. Ranked list of genome CNA sites (assessed by comparative genomic hybridization arrays – aCGH) and of differentially expressed genes (assessed by microarray profiling with Affy HG-U133A chips) are computed and combined on a breast cancer dataset for the discrimination of Luminal/ ER+ (Lum/ER+) and Basal-like/ER- classes. Different encodings are developed and applied to the CNA data, and predictive variable selection is discussed. We analyze the combination of profiling information between the platforms, also considering the pathophysiological data. A specific subset of patients is identified that has a different response to classification by chromosomal gains and losses and by differentially expressed genes, corroborating the idea that genomic CNA can represent an independent source for tumor classification
- …