3 research outputs found

    NIPUNA: A Novel Optimizer Activation Function for Deep Neural Networks

    Get PDF
    In recent years, various deep neural networks with different learning paradigms have been widely employed in various applications, including medical diagnosis, image analysis, self-driving vehicles and others. The activation functions employed in deep neural networks have a huge impact on the training model and the reliability of the model. The Rectified Linear Unit (ReLU) has recently emerged as the most popular and extensively utilized activation function. ReLU has some flaws, such as the fact that it is only active when the units are positive during back-propagation and zero otherwise. This causes neurons to die (dying ReLU) and a shift in bias. However, unlike ReLU activation functions, Swish activation functions do not remain stable or move in a single direction. This research proposes a new activation function named NIPUNA for deep neural networks. We test this activation by training on customized convolutional neural networks (CCNN). On benchmark datasets (Fashion MNIST images of clothes, MNIST dataset of handwritten digits), the contributions are examined and compared to various activation functions. The proposed activation function can outperform traditional activation functions

    Comparison of machine learning approaches for classification of invoices

    Get PDF
    Machine learning has become one of the leading sciences governing modern world. Various disciplines specifically neural networks have recently gained a lot of attention due to its widespread applications. With the recent advances in the technology the resulting big data has augmented the need of bigger means of storage, analysis and henceforth utilization. This not only implies the efficient use of available techniques but suggests surge in the development of new algorithms and techniques. In this project, three different machine learning approaches were implemented utilizing the open source library of keras on TensorFlow as a proof of concept for the task of intelligent invoice automation. The performance of these approaches for improved business on data of invoices has been analysed using the data of two customers with two target attributes per customer as a dataset. The behaviour of neural network hyper-parameters using matplotlib and TensorBoard was empirically calculated and investigated. As part of the first approach, the standard way of implementing predictive algorithm using neural network was followed. Moreover, the hyper-parameters search space was fine-tuned, and the resulting model was studied by grid search on those hyper-parameters. This strategy of hyper-parameters was followed in the next two approaches as well. In the second approach, not only further possible improvement in prediction accuracy is achieved but also the dependency between the two target attributes by using multi-task learning was determined. As per the third implemented approach, the use of continual learning on invoices for postings was analysed. This investigation, that involves the comparison of varied machine learning approaches has broad significance in approving the currently available algorithms for handling such data and suggests means for improvement as well. It holds great prospects, including but not limited to future implementation of such approaches in the domain of finance towards improved customer experience, fraud detection and ease in the assessments of assets etc

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art
    corecore