46 research outputs found

    Novel Trends in Scaling Up Machine Learning Algorithms

    Get PDF
    Big Data has been a catalyst force for the Machine Learning (ML) area, forcing us to rethink existing strategies in order to create innovative solutions that will push forward the field. This paper presents an overview of the strategies for using machine learning in Big Data with emphasis on the high-performance parallel implementations on many-core hardware. The rationale is to increase the practical applicability of ML implementations to large-scale data problems. The common underlying thread has been the recent progress in usability, cost effectiveness and diversity of parallel computing platforms, specifically, the Graphics Processing Units (GPUs), tailored for a broad set of data analysis and Machine Learning tasks. In this context, we provide the main outcomes of a GPU Machine Learning Library (GPUMLib) framework, which empowers researchers with the capacity to tackle larger and more complex problems, by using high-performance implementations of wellknown ML algorithms. Moreover, we attempt to give insights on the future trends of Big Data Analytics and the challenges lying ahead

    Real-Time WebRTC based Mobile Surveillance System

    Get PDF
    The rapid growth that has taken place in Computer Vision has been instrumental in driving the advancement of Image processing techniques and drawing inferences from them. Combined with the enormous capabilities that Deep Neural networks bring to the table, computers can be efficiently trained to automate the tasks and yield accurate and robust results quickly thus optimizing the process. Technological growth has enabled us to bring such computationally intensive tasks to lighter and lower-end mobile devices thus opening up a wide range of possibilities. WebRTC-the open-source web standard enables us to send multimedia-based data from peer to peer paving the way for Real-time Communication over the Web. With this project, we aim to build on one such opportunity that can enable us to perform custom object detection through an android based application installed on our mobile phones. Therefore, our problem statement is to be able to capture real-time feeds, perform custom object detection, generate inference results, and appropriately send intruder alerts when needed. To implement this, we propose a mobile-based over-the-cloud solution that can capitalize on the enormous and encouraging features of the YOLO algorithm and incorporate the functionalities of OpenCV’s DNN module for providing us with fast and correct inferences.  Coupled with a good and intuitive UI, we can ensure ease of use of our application

    Deep learning architectures for Computer Vision

    Get PDF
    Deep learning has become part of many state-of-the-art systems in multiple disciplines (specially in computer vision and speech processing). In this thesis Convolutional Neural Networks are used to solve the problem of recognizing people in images, both for verification and identification. Two different architectures, AlexNet and VGG19, both winners of the ILSVRC, have been fine-tuned and tested with four datasets: Labeled Faces in the Wild, FaceScrub, YouTubeFaces and Google UPC, a dataset generated at the UPC. Finally, with the features extracted from these fine-tuned networks, some verifications algorithms have been tested including Support Vector Machines, Joint Bayesian and Advanced Joint Bayesian formulation. The results of this work show that an Area Under the Receiver Operating Characteristic curve of 99.6% can be obtained, close to the state-of-the-art performance.El aprendizaje profundo se ha convertido en parte de muchos sistemas en el estado del arte de múltiples ámbitos (especialmente en visión por computador y procesamiento de voz). En esta tesis se utilizan las Redes Neuronales Convolucionales para resolver el problema de reconocer a personas en imágenes, tanto para verificación como para identificación. Dos arquitecturas diferentes, AlexNet y VGG19, ambas ganadores del ILSVRC, han sido afinadas y probadas con cuatro conjuntos de datos: Labeled Faces in the Wild, FaceScrub, YouTubeFaces y Google UPC, un conjunto generado en la UPC. Finalmente con las características extraídas de las redes afinadas, se han probado diferentes algoritmos de verificación, incluyendo Maquinas de Soporte Vectorial, Joint Bayesian y Advanced Joint Bayesian. Los resultados de este trabajo muestran que el Área Bajo la Curva de la Característica Operativa del Receptor puede llegar a ser del 99.6%, cercana al valor del estado del arte.L’aprenentatge profund s’ha convertit en una part importat de molts sistemes a l’estat de l’art de múltiples àmbits (especialment de la visió per computador i el processament de veu). A aquesta tesi s’utilitzen les Xarxes Neuronals Convolucionals per a resoldre el problema de reconèixer persones a imatges, tant per verificació com per identificatió. Dos arquitectures diferents, AlexNet i VGG19, les dues guanyadores del ILSVRC, han sigut afinades i provades amb quatre bases de dades: Labeled Faces in the Wild, FaceScrub, YouTubeFaces i Google UPC, un conjunt generat a la UPC. Finalment, amb les característiques extretes de les xarxes afinades, s’han provat diferents algoritmes de verificació, incloent Màquines de Suport Vectorial, Joint Bayesian i Advanced Joint Bayesian. Els resultats d’aquest treball mostres que un Àrea Baix la Curva de la Característica Operativa del Receptor por arribar a ser del 99.6%, propera al valor de l’estat de l’art

    Large-Scale Optical Neural Networks based on Photoelectric Multiplication

    Full text link
    Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large (N106N \gtrsim 10^6) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit- and image-classification reveal a "standard quantum limit" for optical neural networks, set by photodetector shot noise. This bound, which can be as low as 50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully-connected and convolutional networks. We also present a scheme for back-propagation and training that can be performed in the same hardware. This architecture will enable a new class of ultra-low-energy processors for deep learning.Comment: Text: 10 pages, 5 figures, 1 table. Supplementary: 8 pages, 5, figures, 2 table

    Fast On-line Statistical Learning on a GPGPU

    Get PDF
    On-line Machine Learning using Stochastic Gradient Descent is an inherently sequential computation. This makes it difficult to improve performance by simply employing parallel architectures. Langford et al. made a modification to the standard stochastic gradient descent approach which opens up the possibility of parallel computation. They also proved that there is no significant loss in accuracy in their approach. They did empirically demonstrate the performance gain in speed for the case of a pipelined architecture with a few processing units. In this paper we report on applying the Langford et al. approach on a General Purpose Graphics Processing Unit (GPGPU) with a large number of processing units. We accelerate the learning speed by approximately 4.5 times compared to a standard single threaded approach with comparable accuracy. We also evaluate the GPU performance for the sequential variant of the algorithm, which has not previously been reported. Finally, we investigate how changes in the number of threads, number of blocks, and amount of delay, effects the overall performance and accuracy
    corecore