5,358 research outputs found
Stable Tensor Neural Networks for Rapid Deep Learning
We propose a tensor neural network (-NN) framework that offers an exciting
new paradigm for designing neural networks with multidimensional (tensor) data.
Our network architecture is based on the -product (Kilmer and Martin, 2011),
an algebraic formulation to multiply tensors via circulant convolution. In this
-product algebra, we interpret tensors as -linear operators analogous to
matrices as linear operators, and hence our framework inherits mimetic matrix
properties. To exemplify the elegant, matrix-mimetic algebraic structure of our
-NNs, we expand on recent work (Haber and Ruthotto, 2017) which interprets
deep neural networks as discretizations of non-linear differential equations
and introduces stable neural networks which promote superior generalization.
Motivated by this dynamic framework, we introduce a stable -NN which
facilitates more rapid learning because of its reduced, more powerful
parameterization. Through our high-dimensional design, we create a more compact
parameter space and extract multidimensional correlations otherwise latent in
traditional algorithms. We further generalize our -NN framework to a family
of tensor-tensor products (Kernfeld, Kilmer, and Aeron, 2015) which still
induce a matrix-mimetic algebraic structure. Through numerical experiments on
the MNIST and CIFAR-10 datasets, we demonstrate the more powerful
parameterizations and improved generalizability of stable -NNs.Comment: 20 pages, 6 figures, submitted to SIMOD
Tensor Representation in High-Frequency Financial Data for Price Change Prediction
Nowadays, with the availability of massive amount of trade data collected,
the dynamics of the financial markets pose both a challenge and an opportunity
for high frequency traders. In order to take advantage of the rapid, subtle
movement of assets in High Frequency Trading (HFT), an automatic algorithm to
analyze and detect patterns of price change based on transaction records must
be available. The multichannel, time-series representation of financial data
naturally suggests tensor-based learning algorithms. In this work, we
investigate the effectiveness of two multilinear methods for the mid-price
prediction problem against other existing methods. The experiments in a large
scale dataset which contains more than 4 millions limit orders show that by
utilizing tensor representation, multilinear models outperform vector-based
approaches and other competing ones.Comment: accepted in SSCI 2017, typos fixe
On the Relation between Color Image Denoising and Classification
Large amount of image denoising literature focuses on single channel images
and often experimentally validates the proposed methods on tens of images at
most. In this paper, we investigate the interaction between denoising and
classification on large scale dataset. Inspired by classification models, we
propose a novel deep learning architecture for color (multichannel) image
denoising and report on thousands of images from ImageNet dataset as well as
commonly used imagery. We study the importance of (sufficient) training data,
how semantic class information can be traded for improved denoising results. As
a result, our method greatly improves PSNR performance by 0.34 - 0.51 dB on
average over state-of-the art methods on large scale dataset. We conclude that
it is beneficial to incorporate in classification models. On the other hand, we
also study how noise affect classification performance. In the end, we come to
a number of interesting conclusions, some being counter-intuitive
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games
Researchers on artificial intelligence have achieved human-level intelligence
in large-scale perfect-information games, but it is still a challenge to
achieve (nearly) optimal results (in other words, an approximate Nash
Equilibrium) in large-scale imperfect-information games (i.e. war games,
football coach or business strategies). Neural Fictitious Self Play (NFSP) is
an effective algorithm for learning approximate Nash equilibrium of
imperfect-information games from self-play without prior domain knowledge.
However, it relies on Deep Q-Network, which is off-line and is hard to converge
in online games with changing opponent strategy, so it can't approach
approximate Nash equilibrium in games with large search scale and deep search
depth. In this paper, we propose Monte Carlo Neural Fictitious Self Play
(MC-NFSP), an algorithm combines Monte Carlo tree search with NFSP, which
greatly improves the performance on large-scale zero-sum imperfect-information
games. Experimentally, we demonstrate that the proposed Monte Carlo Neural
Fictitious Self Play can converge to approximate Nash equilibrium in games with
large-scale search depth while the Neural Fictitious Self Play can't.
Furthermore, we develop Asynchronous Neural Fictitious Self Play (ANFSP). It
use asynchronous and parallel architecture to collect game experience. In
experiments, we show that parallel actor-learners have a further accelerated
and stabilizing effect on training
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
MoleculeNet: A Benchmark for Molecular Machine Learning
Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine
learning algorithms to make increasingly accurate predictions about molecular
properties. However, algorithmic progress has been limited due to the lack of a
standard benchmark to compare the efficacy of proposed methods; most new
algorithms are benchmarked on different datasets making it challenging to gauge
the quality of proposed methods. This work introduces MoleculeNet, a large
scale benchmark for molecular machine learning. MoleculeNet curates multiple
public datasets, establishes metrics for evaluation, and offers high quality
open-source implementations of multiple previously proposed molecular
featurization and learning algorithms (released as part of the DeepChem open
source library). MoleculeNet benchmarks demonstrate that learnable
representations are powerful tools for molecular machine learning and broadly
offer the best performance. However, this result comes with caveats. Learnable
representations still struggle to deal with complex tasks under data scarcity
and highly imbalanced classification. For quantum mechanical and biophysical
datasets, the use of physics-aware featurizations can be more important than
choice of particular learning algorithm
Efficient Network Construction through Structural Plasticity
Deep Neural Networks (DNNs) on hardware is facing excessive computation cost
due to the massive number of parameters. A typical training pipeline to
mitigate over-parameterization is to pre-define a DNN structure first with
redundant learning units (filters and neurons) under the goal of high accuracy,
then to prune redundant learning units after training with the purpose of
efficient inference. We argue that it is sub-optimal to introduce redundancy
into training for the purpose of reducing redundancy later in inference.
Moreover, the fixed network structure further results in poor adaption to
dynamic tasks, such as lifelong learning. In contrast, structural plasticity
plays an indispensable role in mammalian brains to achieve compact and accurate
learning. Throughout the lifetime, active connections are continuously created
while those no longer important are degenerated. Inspired by such observation,
we propose a training scheme, namely Continuous Growth and Pruning (CGaP),
where we start the training from a small network seed, then literally execute
continuous growth by adding important learning units and finally prune
secondary ones for efficient inference. The inference model generated from CGaP
is sparse in the structure, largely decreasing the inference power and latency
when deployed on hardware platforms. With popular DNN structures on
representative datasets, the efficacy of CGaP is benchmarked by both algorithm
simulation and architectural modeling on Field-programmable Gate Arrays (FPGA).
For example, CGaP decreases the FLOPs, model size, DRAM access energy and
inference latency by 63.3%, 64.0%, 11.8% and 40.2%, respectively, for
ResNet-110 on CIFAR-10
Efficient Structured Pruning and Architecture Searching for Group Convolution
Efficient inference of Convolutional Neural Networks is a thriving topic
recently. It is desirable to achieve the maximal test accuracy under given
inference budget constraints when deploying a pre-trained model. Network
pruning is a commonly used technique while it may produce irregular sparse
models that can hardly gain actual speed-up. Group convolution is a promising
pruning target due to its regular structure; however, incorporating such
structure into the pruning procedure is challenging. It is because structural
constraints are hard to describe and can make pruning intractable to solve. The
need for configuring group convolution architecture, i.e., the number of
groups, that maximises test accuracy also increases difficulty.
This paper presents an efficient method to address this challenge. We
formulate group convolution pruning as finding the optimal channel permutation
to impose structural constraints and solve it efficiently by heuristics. We
also apply local search to exploring group configuration based on estimated
pruning cost to maximise test accuracy. Compared to prior work, results show
that our method produces competitive group convolution models for various tasks
within a shorter pruning period and enables rapid group configuration
exploration subject to inference budget constraints.Comment: Published as an ICCV'19 NEUARCH workshop pape
Medical Knowledge Embedding Based on Recursive Neural Network for Multi-Disease Diagnosis
The representation of knowledge based on first-order logic captures the
richness of natural language and supports multiple probabilistic inference
models. Although symbolic representation enables quantitative reasoning with
statistical probability, it is difficult to utilize with machine learning
models as they perform numerical operations. In contrast, knowledge embedding
(i.e., high-dimensional and continuous vectors) is a feasible approach to
complex reasoning that can not only retain the semantic information of
knowledge but also establish the quantifiable relationship among them. In this
paper, we propose recursive neural knowledge network (RNKN), which combines
medical knowledge based on first-order logic with recursive neural network for
multi-disease diagnosis. After RNKN is efficiently trained from manually
annotated Chinese Electronic Medical Records (CEMRs), diagnosis-oriented
knowledge embeddings and weight matrixes are learned. Experimental results
verify that the diagnostic accuracy of RNKN is superior to that of some
classical machine learning models and Markov logic network (MLN). The results
also demonstrate that the more explicit the evidence extracted from CEMRs is,
the better is the performance achieved. RNKN gradually exhibits the
interpretation of knowledge embeddings as the number of training epochs
increases
- …