656 research outputs found
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
Learning from distributed data sources using random vector functional-link networks
One of the main characteristics in many real-world big data scenarios is their distributed nature. In a machine learning context, distributed data, together with the requirements of preserving privacy and scaling up to large networks, brings the challenge of designing fully decentralized training protocols. In this paper, we explore the problem of distributed learning when the features of every pattern are available throughout multiple agents (as is happening, for example, in a distributed database scenario). We propose an algorithm for a particular class of neural networks, known as Random Vector Functional-Link (RVFL), which is based on the Alternating Direction Method of Multipliers optimization algorithm. The proposed algorithm allows to learn an RVFL network from multiple distributed data sources, while restricting communication to the unique operation of computing a distributed average. Our experimental simulations show that the algorithm is able to achieve a generalization accuracy comparable to a fully centralized solution, while at the same time being extremely efficient
Widely Linear Kernels for Complex-Valued Kernel Activation Functions
Complex-valued neural networks (CVNNs) have been shown to be powerful
nonlinear approximators when the input data can be properly modeled in the
complex domain. One of the major challenges in scaling up CVNNs in practice is
the design of complex activation functions. Recently, we proposed a novel
framework for learning these activation functions neuron-wise in a
data-dependent fashion, based on a cheap one-dimensional kernel expansion and
the idea of kernel activation functions (KAFs). In this paper we argue that,
despite its flexibility, this framework is still limited in the class of
functions that can be modeled in the complex domain. We leverage the idea of
widely linear complex kernels to extend the formulation, allowing for a richer
expressiveness without an increase in the number of adaptable parameters. We
test the resulting model on a set of complex-valued image classification
benchmarks. Experimental results show that the resulting CVNNs can achieve
higher accuracy while at the same time converging faster.Comment: Accepted at ICASSP 201
Bidirectional deep-readout echo state networks
We propose a deep architecture for the classification of multivariate time
series. By means of a recurrent and untrained reservoir we generate a vectorial
representation that embeds temporal relationships in the data. To improve the
memorization capability, we implement a bidirectional reservoir, whose last
state captures also past dependencies in the input. We apply dimensionality
reduction to the final reservoir states to obtain compressed fixed size
representations of the time series. These are subsequently fed into a deep
feedforward network trained to perform the final classification. We test our
architecture on benchmark datasets and on a real-world use-case of blood
samples classification. Results show that our method performs better than a
standard echo state network and, at the same time, achieves results comparable
to a fully-trained recurrent network, but with a faster training
Adaptation and learning over networks for nonlinear system modeling
In this chapter, we analyze nonlinear filtering problems in distributed
environments, e.g., sensor networks or peer-to-peer protocols. In these
scenarios, the agents in the environment receive measurements in a streaming
fashion, and they are required to estimate a common (nonlinear) model by
alternating local computations and communications with their neighbors. We
focus on the important distinction between single-task problems, where the
underlying model is common to all agents, and multitask problems, where each
agent might converge to a different model due to, e.g., spatial dependencies or
other factors. Currently, most of the literature on distributed learning in the
nonlinear case has focused on the single-task case, which may be a strong
limitation in real-world scenarios. After introducing the problem and reviewing
the existing approaches, we describe a simple kernel-based algorithm tailored
for the multitask case. We evaluate the proposal on a simulated benchmark task,
and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for
Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C.
Principe (2018
Randomness in neural networks: an overview
Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neural networks involves the optimization of nonconvex objective functions, and usually, the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counterintuitive, alternative is to randomly assign a subset of the networks’ weights so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favorable benefits, including (1) simplicity of implementation, (2) faster learning with less intervention from human beings, and (3) possibility of leveraging overall linear regression and classification algorithms (e.g., ℓ 1 norm minimization for obtaining sparse formulations). This class of neural networks attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims to provide a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems, and most importantly, foster the exchanges of well-known results throughout different communities. WIREs Data Mining Knowl Discov 2017, 7:e1200. doi: 10.1002/widm.1200
Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation
We study distributed stochastic nonconvex optimization in multi-agent
networks. We introduce a novel algorithmic framework for the distributed
minimization of the sum of the expected value of a smooth (possibly nonconvex)
function (the agents' sum-utility) plus a convex (possibly nonsmooth)
regularizer. The proposed method hinges on successive convex approximation
(SCA) techniques, leveraging dynamic consensus as a mechanism to track the
average gradient among the agents, and recursive averaging to recover the
expected gradient of the sum-utility function. Almost sure convergence to
(stationary) solutions of the nonconvex problem is established. Finally, the
method is applied to distributed stochastic training of neural networks.
Numerical results confirm the theoretical claims, and illustrate the advantages
of the proposed method with respect to other methods available in the
literature.Comment: Proceedings of 2019 Asilomar Conference on Signals, Systems, and
Computer
Dye diffusion during laparoscopic tubal patency tests may suggest a lymphatic contribution to dissemination in endometriosis: A prospective, observational study
Aim Women with adenomyosis are at higher risk of endometriosis recurrence after surgery. This study was to assess if the lymphatic vessel network drained from the uterus to near organs where endometriosis foci lied. Methods A prospective, observational study, Canadian Task Force Classification II-2, was conducted at Sacro Cuore Don Calabria Hospital, Negrar, Italy. 104 white women aged 18–43 years were enrolled consecutively for this study. All patients underwent laparoscopy for endometriosis and a tubal dye test was carried out. Results Evidence of dye dissemination through the uterine wall and outside the uterus was noted in 27 patients (26%) with adenomyosis as it permeated the uterine wall and a clear passage of the dye was shown in the pelvic lymphatic vessels regardless whether the tubes were unobstructed. Histological assessment of the uterine biopsies confirmed adenomyosis. Conclusion Adenomyosis is characterized by ectatic lymphatics that allow the drainage of intrauterine fluids (the dye and, perhaps, menstrual blood) at minimal intrauterine pressure from the uterine cavity though the lymphatic network to extrauterine organs. Certainly, this may not be the only explanation for endometriosis dissemination but the correlation between the routes of the dye drainage and location of endometriosis foci is highly suggestive
- …