    An efficient semi-sigmoidal non-linear activation function approach for deep neural networks

    A non-linear activation function is one of the key contributing factors to the success of Deep Learning (DL). Since the revival of DL takes place in 2012, Rectified Linear Unit (ReLU) has been regarded as a de facto standard for many DL models by the community. Despite its popularity, however, ReLU contains several shortcomings that could result in inefficient learning of the DL models. These shortcomings are: 1) the inherent negative cancellation property in ReLU tends to remove all negative inputs and causes massive information lost to the network; 2) the derivative of ReLU potentially causes the occurrence of dead neurons problem to the networks; 3) the mean activation generated by ReLU is highly positive and lead to bias shift effect in the network layers; 4) the inherent multilinear structure of ReLU restricts the nonlinear capability of the networks; 5) the predefined nature of ReLU limits the flexibility of the networks. To address these shortcomings, this study proposed a new variant of activation function based on the Semi-sigmoidal (Sig) approach. Based on this approach, three variants of activation functions are introduced, namely, Shifted Semisigmoidal (SSig), Adaptive Shifted Semi-sigmoidal (ASSig), and Bi-directional Adaptive Shifted Semi-sigmoidal (BiASSig). The proposed activation functions were tested against the ReLU (baseline) and state-of-the-art methods using eight Deep Neural Networks (DNNs) on seven benchmark image datasets. Further, Adaptive Moment Estimation (ADAM) and Stochastic Gradient Descent (SGD) were selected as optimizers to train the DNNs. The baseline comparison score and mean rank were used to consolidate and analyse the experimental results effectively. The experimental results in terms of the overall baseline comparison score shown that SSig, ASSig, and BiASSig obtained the score of 79, 87, and 86 out of 112, respectively, which achieving outstanding performance than ReLU in more than 70% of the cases. In terms of overall mean rank (OMR), ReLU ranked at tenth (10th), whereas SSig, ASSig, and BiASSig ranked at fifth (5th), first (1st), and second (2nd), showing remarkable performance than ReLU and other comparing methods

    Activation functions : comparison of trends in practice and research for deep learning

    Deep neural networks (DNN) have been successfully used in diverse emerging domains to solve real world complex problems with may more deep learning (DL) architectures, being developed to date. To achieve this state-of-the-art (SOTA) performances, the DL architectures use activation functions (AFs), to perform diverse computations between the hidden layers and the output layers of any given DL architecture. This paper presents a survey on the existing AFs used in deep learning applications and highlights the recent trends in the use of the AFs for DL applications. The novelty of this paper is that it compiles the majority of the AFs used in DL and outlines the current trends in the applications and usage of these functions in practical deep learning deployments against the SOTA research results. This compilation will aid in making effective decisions in the choice of the most suitable and appropriate AF for a given application, ready for deployment. This paper is timely because majority of the research papers on AF highlights similar works and results while this paper will be the first, to compile the trends in AF applications in practice against the research results from the literature, found in DL research to date

    Essays on Predictive Analytics in E-Commerce

    Die Motivation fĂŒr diese Dissertation ist dualer Natur: Einerseits ist die Dissertation methodologisch orientiert und entwickelt neue statistische AnsĂ€tze und Algorithmen fĂŒr maschinelles Lernen. Gleichzeitig ist sie praktisch orientiert und fokussiert sich auf den konkreten Anwendungsfall von Produktretouren im Onlinehandel. Die “data explosion”, veursacht durch die Tatsache, dass die Kosten fĂŒr das Speichern und Prozessieren großer Datenmengen signifikant gesunken sind (Bhimani and Willcocks, 2014), und die neuen Technologien, die daraus resultieren, stellen die grĂ¶ĂŸte DiskontinuitĂ€t fĂŒr die betriebliche Praxis und betriebswirtschaftliche Forschung seit Entwicklung des Internets dar (Agarwal and Dhar, 2014). Insbesondere die Business Intelligence (BI) wurde als wichtiges Forschungsthema fĂŒr Praktiker und Akademiker im Bereich der Wirtschaftsinformatik (WI) identifiziert (Chen et al., 2012). Maschinelles Lernen wurde erfolgreich auf eine Reihe von BI-Problemen angewandt, wie zum Beispiel Absatzprognose (Choi et al., 2014; Sun et al., 2008), Prognose von Windstromerzeugung (Wan et al., 2014), Prognose des Krankheitsverlaufs von Patienten eines Krankenhauses (Liu et al., 2015), Identifikation von Betrug Abbasi et al., 2012) oder Recommender-Systeme (Sahoo et al., 2012). Allerdings gibt es nur wenig Forschung, die sich mit Fragestellungen um maschinelles Lernen mit spezifischen Bezug zu BI befasst: Obwohl existierende Algorithmen teilweise modifiziert werden, um sie auf ein bestimmtes Problem anzupassen (Abbasi et al., 2010; Sahoo et al., 2012), beschrĂ€nkt sich die WI-Forschung im Allgemeinen darauf, existierende Algorithmen, die fĂŒr andere Fragestellungen als BI entwickelt wurden, auf BI-Fragestellungen anzuwenden (Abbasi et al., 2010; Sahoo et al., 2012). Das erste wichtige Ziel dieser Dissertation besteht darin, einen Beitrag dazu zu leisten, diese LĂŒcke zu schließen. Diese Dissertation fokussiert sich auf das wichtige BI-Problem von Produktretouren im Onlinehandel fĂŒr eine Illustration und praktische Anwendung der vorgeschlagenen Konzepte. Viele OnlinehĂ€ndler sind nicht profitabel (Rigby, 2014) und Produktretouren sind eine wichtige Ursache fĂŒr dieses Problem (Grewal et al., 2004). Neben Kostenaspekten sind Produktretouren aus ökologischer Sicht problematisch. In der Logistikforschung ist es weitestgehend Konsens, dass die “letzte Meile” der Zulieferkette, nĂ€mlich dann wenn das Produkt an die HaustĂŒr des Kunden geliefert wird, am CO2-intensivsten ist (Browne et al., 2008; HalldĂłrsson et al., 2010; Song et al., 2009). Werden Produkte retourniert, wird dieser energieintensive Schritt wiederholt, wodurch sich die Nachhaltigkeit und Umweltfreundlichkeit des GeschĂ€ftsmodells von OnlinehĂ€ndlern relativ zum klassischen Vertrieb reduziert. Allerdings können OnlinehĂ€ndler Produktretouren nicht einfach verbieten, da sie einen wichtigen Teil ihres GeschĂ€ftsmodells darstellen: So hat die Möglichkeit, Produkte zu retournieren positive Auswirkungen auf Kundenzufriedenheit (Cassill, 1998), Kaufverhalten (Wood, 2001), kĂŒnftiges Kaufverhalten (Petersen and Kumar, 2009) und emotianale Reaktionen der Kunden (Suwelack et al., 2011). Ein vielversprechender Ansatz besteht darin, sich auf impulsives und kompulsives (LaRose, 2001) sowie betrĂŒgerisches Kaufverhalten zu fokussieren (Speights and Hilinski, 2005; Wachter et al., 2012). In gegenwĂ€rtigen akademschen Literatur zu dem Thema gibt es keine solchen Strategien. Die meisten Strategien unterscheiden nicht zwischen gewollten und ungewollten Retouren (Walsh et al., 2014). Das zweite Ziel dieser Dissertation besteht daher darin, die Basis fĂŒr eine Strategie von Prognose und Intervention zu entwickeln, mit welcher Konsumverhalten mit hoher Retourenwahrscheinlichkeit im Vorfeld erkannt und rechtzeitig interveniert werden kann. In dieser Dissertation werden mehrere Prognosemodelle entwickelt, auf Basis welcher demonstriert wird, dass die Strategie, unter der Annahme moderat effektiver Interventionsstrategien, erhebliche Kosteneinsparungen mit sich bringt

    Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization

    Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets.We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods

    Enhancing remanufacturing automation using deep learning approach

    In recent years, remanufacturing has significant interest from researchers and practitioners to improve efficiency through maximum value recovery of products at end-of-life (EoL). It is a process of returning used products, known as EoL products, to as-new condition with matching or higher warranty than the new products. However, these remanufacturing processes are complex and time-consuming to implement manually, causing reduced productivity and posing dangers to personnel. These challenges require automating the various remanufacturing process stages to achieve higher throughput, reduced lead time, cost and environmental impact while maximising economic gains. Besides, as highlighted by various research groups, there is currently a shortage of adequate remanufacturing-specific technologies to achieve full automation. -- This research explores automating remanufacturing processes to improve competitiveness by analysing and developing deep learning-based models for automating different stages of the remanufacturing processes. Analysing deep learning algorithms represents a viable option to investigate and develop technologies with capabilities to overcome the outlined challenges. Deep learning involves using artificial neural networks to learn high-level abstractions in data. Deep learning (DL) models are inspired by human brains and have produced state-of-the-art results in pattern recognition, object detection and other applications. The research further investigates the empirical data of torque converter components recorded from a remanufacturing facility in Glasgow, UK, using the in-case and cross-case analysis to evaluate the remanufacturing inspection, sorting, and process control applications. -- Nevertheless, the developed algorithm helped capture, pre-process, train, deploy and evaluate the performance of the respective processes. The experimental evaluation of the in-case and cross-case analysis using model prediction accuracy, misclassification rate, and model loss highlights that the developed models achieved a high prediction accuracy of above 99.9% across the sorting, inspection and process control applications. Furthermore, a low model loss between 3x10-3 and 1.3x10-5 was obtained alongside a misclassification rate that lies between 0.01% to 0.08% across the three applications investigated, thereby highlighting the capability of the developed deep learning algorithms to perform the sorting, process control and inspection in remanufacturing. The results demonstrate the viability of adopting deep learning-based algorithms in automating remanufacturing processes, achieving safer and more efficient remanufacturing. -- Finally, this research is unique because it is the first to investigate using deep learning and qualitative torque-converter image data for modelling remanufacturing sorting, inspection and process control applications. It also delivers a custom computational model that has the potential to enhance remanufacturing automation when utilised. The findings and publications also benefit both academics and industrial practitioners.     Automatic transcription of music using deep learning techniques

    Music transcription is the problem of detecting notes that are being played in a musical piece. This is a difficult task that only trained people are capable of doing. Due to its difficulty, there have been a high interest in automate it. However, automatic music transcription encompasses several fields of research such as, digital signal processing, machine learning, music theory and cognition, pitch perception and psychoacoustics. All of this, makes automatic music transcription an hard problem to solve. In this work we present a novel approach of automatically transcribing piano musical pieces using deep learning techniques. We take advantage of deep learning techniques to build several classifiers, each one responsible for detecting only one musical note. In theory, this division of work would enhance the ability of each classifier to transcribe. Apart from that, we also apply two additional stages, pre-processing and post-processing, to improve the efficiency of our system. The pre-processing stage aims at improving the quality of the input data before the classification/transcription stage, while the post-processing aims at fixing errors originated during the classification stage. In the initial steps, preliminary experiments have been performed to fine tune our model, in both three stages: pre-processing, classification and post-processing. The experimental setup, using those optimized techniques and parameters, is shown and a comparison is given with other two state-of-the-art works that apply the same dataset as well as the same deep learning technique but using a different approach. By different approach we mean that a single neural network is used to detect all the musical notes rather than one neural network per each note. Our approach was able to surpass in frame-based metrics these works, while reaching close results in onset-based metrics, demonstrating the feasability of our approach
