1,551 research outputs found

    Exploiting Heterogeneity in Operational Neural Networks by Synaptic Plasticity

    Get PDF
    The recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs) that are homogenous only with a linear neuron model. As a heterogenous network model, ONNs are based on a generalized neuron model that can encapsulate any set of non-linear operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. However, the default search method to find optimal operators in ONNs, the so-called Greedy Iterative Search (GIS) method, usually takes several training sessions to find a single operator set per layer. This is not only computationally demanding, also the network heterogeneity is limited since the same set of operators will then be used for all neurons in each layer. To address this deficiency and exploit a superior level of heterogeneity, in this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the Synaptic Plasticity paradigm that poses the essential learning theory in biological neurons. During training, each operator set in the library can be evaluated by their synaptic plasticity level, ranked from the worst to the best, and an elite ONN can then be configured using the top ranked operator sets found at each hidden layer. Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs and as a result the performance gap over the CNNs further widens.Comment: 15 pages, 19 figures, journal manuscrip

    Operational Neural Networks

    Get PDF
    Feed-forward, fully-connected Artificial Neural Networks (ANNs) or the so-called Multi-Layer Perceptrons (MLPs) are well-known universal approximators. However, their learning performance varies significantly depending on the function or the solution space that they attempt to approximate. This is mainly because of their homogenous configuration based solely on the linear neuron model. Therefore, while they learn very well those problems with a monotonous, relatively simple and linearly separable solution space, they may entirely fail to do so when the solution space is highly nonlinear and complex. Sharing the same linear neuron model with two additional constraints (local connections and weight sharing), this is also true for the conventional Convolutional Neural Networks (CNNs) and, it is, therefore, not surprising that in many challenging problems only the deep CNNs with a massive complexity and depth can achieve the required diversity and the learning performance. In order to address this drawback and also to accomplish a more generalized model over the convolutional neurons, this study proposes a novel network model, called Operational Neural Networks (ONNs), which can be heterogeneous and encapsulate neurons with any set of operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. Finally, a novel training method is formulated to back-propagate the error through the operational layers of ONNs. Experimental results over highly challenging problems demonstrate the superior learning capabilities of ONNs even with few neurons and hidden layers.Comment: 21 page

    Efficient Design, Training, and Deployment of Artificial Neural Networks

    Get PDF
    Over the last decade, artificial neural networks, especially deep neural networks, have emerged as the main modeling tool in Machine Learning, allowing us to tackle an increasing number of real-world problems in various fields, most notably, in computer vision, natural language processing, biomedical and financial analysis. The success of deep neural networks can be attributed to many factors, namely the increasing amount of data available, the developments of dedicated hardware, the advancements in optimization techniques, and especially the invention of novel neural network architectures. Nowadays, state-of-the-arts neural networks that achieve the best performance in any field are usually formed by several layers, comprising millions, or even billions of parameters. Despite spectacular performances, optimizing a single state-of- the-arts neural network often requires a tremendous amount of computation, which can take several days using high-end hardware. More importantly, it took several years of experimentation for the community to gradually discover effective neural network architectures, moving from AlexNet, VGGNet, to ResNet, and then DenseNet. In addition to the expensive and time-consuming experimentation process, deep neural networks, which require powerful processors to operate during the deployment phase, cannot be easily deployed to mobile or embedded devices. For these reasons, improving the design, training, and deployment of deep neural networks has become an important area of research in the Machine Learning field. This thesis makes several contributions in the aforementioned research area, which can be grouped into two main categories. The first category consists of research works that focus on designing efficient neural network architectures not only in terms of accuracy but also computational complexity. In the first contribution under this category, the computational efficiency is first addressed at the filter level through the incorporation of a handcrafted design for convolutional neural networks, which are the basis of most deep neural networks. More specifically, the multilinear convolution filter is proposed to replace the linear convolution filter, which is a fundamental element in a convolutional neural network. The new filter design not only better captures multidimensional structures inherent in CNNs but also requires far fewer parameters to be estimated. While using efficient algebraic transforms and approximation techniques to tackle the design problem can significantly reduce the memory and computational footprint of neural network models, this approach requires a lot of trial and error. In addition, the simple neuron model used in most neural networks nowadays, which only performs a linear transformation followed by a nonlinear activation, cannot effectively mimic the diverse activities of biological neurons. For this reason, the second and third contributions transition from a handcrafted, manual design approach to an algorithmic approach in which the type of transformations performed by each neuron as well as the topology of neural networks are optimized in a systematic and completely data-dependent manner. As a result, the algorithms proposed in the second and third contributions are capable of designing highly accurate and compact neural networks while requiring minimal human efforts or intervention in the design process. Despite significant progress has been made to reduce the runtime complexity of neural network models on embedded devices, the majority of them have been demonstrated on powerful embedded devices, which are costly in applications that require large-scale deployment such as surveillance systems. In these scenarios, complete on-device processing solutions can be infeasible. On the contrary, hybrid solutions, where some preprocessing steps are conducted on the client side while the heavy computation takes place on the server side, are more practical. The second category of contributions made in this thesis focuses on efficient learning methodologies for hybrid solutions that take into ac- count both the signal acquisition and inference steps. More concretely, the first contribution under this category is the formulation of the Multilinear Compressive Learning framework in which multidimensional signals are compressively acquired, and inference is made based on the compressed signals, bypassing the signal reconstruction step. In the second contribution, the relationships be- tween the input signal resolution, the compression rate, and the learning performance of Multilinear Compressive Learning systems are empirically analyzed systematically, leading to the discovery of a surrogate performance indicator that can be used to approximately rank the learning performances of different sensor configurations without conducting the entire optimization process. Nowadays, many communication protocols provide support for adaptive data transmission to maximize the data throughput and minimize energy consumption depending on the network’s strength. The last contribution of this thesis proposes an extension of the Multilinear Compressive Learning framework with an adaptive compression capability, which enables us to take advantage of the adaptive rate transmission feature in existing communication protocols to maximize the informational content throughput of the whole system. Finally, all methodological contributions of this thesis are accompanied by extensive empirical analyses demonstrating their performance and computational advantages over existing methods in different computer vision applications such as object recognition, face verification, human activity classification, and visual information retrieval
    • …
    corecore