362 research outputs found

    Improving Efficiency in Deep Learning for Large Scale Visual Recognition

    Get PDF
    The emerging recent large scale visual recognition methods, and in particular the deep Convolutional Neural Networks (CNN), are promising to revolutionize many computer vision based artificial intelligent applications, such as autonomous driving and online image retrieval systems. One of the main challenges in large scale visual recognition is the complexity of the corresponding algorithms. This is further exacerbated by the fact that in most real-world scenarios they need to run in real time and on platforms that have limited computational resources. This dissertation focuses on improving the efficiency of such large scale visual recognition algorithms from several perspectives. First, to reduce the complexity of large scale classification to sub-linear with the number of classes, a probabilistic label tree framework is proposed. A test sample is classified by traversing the label tree from the root node. Each node in the tree is associated with a probabilistic estimation of all the labels. The tree is learned recursively with iterative maximum likelihood optimization. Comparing to the hard label partition proposed previously, the probabilistic framework performs classification more accurately with similar efficiency. Second, we explore the redundancy of parameters in Convolutional Neural Networks (CNN) and employ sparse decomposition to significantly reduce both the amount of parameters and computational complexity. Both inter-channel and inner-channel redundancy is exploit to achieve more than 90\% sparsity with approximately 1\% drop of classification accuracy. We also propose a CPU based efficient sparse matrix multiplication algorithm to reduce the actual running time of CNN models with sparse convolutional kernels. Third, we propose a multi-stage framework based on CNN to achieve better efficiency than a single traditional CNN model. With a combination of cascade model and the label tree framework, the proposed method divides the input images in both the image space and the label space, and processes each image with CNN models that are most suitable and efficient. The average complexity of the framework is significantly reduced, while the overall accuracy remains the same as in the single complex model

    Modelos de aprendizaje automático en la detección e identificación de personas: una revisión de literatura

    Get PDF
    Introduction: This article is the result of research entitled "Development of a prototype to optimize access conditions to the SENA-Pescadero using artificial intelligence and open-source tools", developed at the Servicio Nacional de Aprendizaje in 2020.   Problem: How to identify Machine Learning Techniques applied to computer vision processes through a literature review? Objective: Determine the application, as well as advantages and disadvantages of machine learning techniques focused on the detection and identification of people. Methodology: Systematic literature review in 4 high-impact bibliographic and scientific databases, using search filters and information selection criteria. Results: Machine Learning techniques defined as Principal Component Analysis, Weak Label Regularized Local Coordinate Coding, Support Vector Machines, Haar Cascade Classifiers and EigenFaces and FisherFaces, as well as their applicability in detection and identification processes.   Conclusion: The research led to the identification of the main computational intelligence techniques based on machine learning, applied to the detection and identification of people. Their influence was shown in several application cases, but most of them were focused on the implementation and optimization of access control systems, or tasks in which the identification of people was required for the execution of processes. Originality: Through this research, we studied and defined the main machine learning techniques currently used for the detection and identification of people. Limitations: The systematic review is limited to information available in the 4 databases consulted, and the amount of information is variable as articles are deposited in the databases.Introducción: Este artículo es el resultado de la investigación titulada " Desarrollo de un prototipo para optimizar las condiciones de acceso al SENA-Pescadero utilizando inteligencia artificial y herramientas de código abierto", desarrollada en el Servicio Nacional de Aprendizaje en 2020. Problema: ¿Cómo identificar las técnicas de aprendizaje automático aplicadas a los procesos de visión por computador a través de una revisión bibliográfica? Objetivo: Determinar la aplicación, así como las ventajas y desventajas de las técnicas de aprendizaje automático enfocadas a la detección e identificación de personas. Metodología: Revisión sistemática de la literatura en 4 bases de datos bibliográficas y científicas de alto impacto, utilizando filtros de búsqueda y criterios de selección de información. Resultados: Técnicas de aprendizaje automático definidas como Análisis de Componentes Principales, Codificación Local de Coordenadas Regularizada de Etiquetas Débiles, Máquinas de Vectores de Soporte, Clasificadores en Cascada de Haar y EigenFaces y FisherFaces, así como su aplicabilidad en procesos de detección e identificación. Conclusiones: La investigación permitió identificar las principales técnicas de inteligencia computacional basadas en machine learning aplicadas a la detección e identificación de personas. Su influencia se mostró en varios casos de aplicación, pero la mayoría de ellos se centraron en la implementación y optimización de sistemas de control de acceso, o tareas en las que se requería la identificación de personas para la ejecución de procesos Originalidad: A través de esta investigación se estudiaron y definieron las principales técnicas de machine learning utilizadas actualmente para la detección e identificación de personas

    Improving time series recognition and prediction with networks and ensembles of passive photonic reservoirs

    Get PDF
    As the performance increase of traditional Von-Neumann computing attenuates, new approaches to computing need to be found. A promising approach for low-power computing at high bitrates is integrated photonic reservoir computing. In the past though, the feasible reservoir size and computational power of integrated photonic reservoirs have been limited by hardware constraints. An alternative solution to building larger reservoirs is the combination of several small reservoirs to match or exceed the performance of a single bigger one. This paper summarizes our efforts to increase the available computational power by combining multiple reservoirs into a single computing architecture. We investigate several possible combination techniques and evaluate their performance using the classic XOR and header recognition tasks as well as the well-known Santa Fe chaotic laser prediction task. Our findings suggest that a new paradigm of feeding a reservoir's output into the readout structure of the next one shows consistently good results for various tasks as well as for both electrical and optical readouts and coupling schemes

    Recognition of transport means in GPS data using machine-learning methods

    Get PDF
    Bicycle transport is today one of the most important measures in urban traffic with a view to moving towards more sustainable mobility. Nowadays, smartphones are equipped with Global Positioning System (GPS), which allows cyclists, through smartphone applications, to record their own routes on a daily basis, which is very useful information for traffic and transport planners.The problem appears when there is invalid data due to errors in the measurement or in the GPS signal. The solution is transport mode recognition, which consists of classifying the different existing transport modes on the basis of a set of data. The emerging techniques of machine learning allow the development of very powerful models capable of recognizing means of transport with great effectiveness, based on other studies.Accordingly, this study aims to separate GPS bicycle tracks from the other modes studied (inner-city train (S-Bahn), walk, bike, tram, bus), also classifying the tracks of each means of transport separately. The key contribution of this study is the design and implementation of a machine learning model capable of classifying existing modes of transport in urban traffic in the city of Dresden in Germany.For this purpose, a cascading classifiers model was designed so that in each phase tracks belonging to a different mode are separated, studying in each phase which of the machine learning algorithms used (Decision Tree, Support Vector Machine and Neural Network) has the best performance. The GPS data was collected with the application for smartphone Cyface and from there it was carried out the structuring of data and calculation and selection of features that serve as inputs of the model.To separate inner-city train (S-Bahn), bike and walk tracks (first three phases) accuracy values above 98 % are obtained for any of the mentioned algorithms. For the fourth phase, where the classification between bus and tram tracks is carried out, the performance of the model is not so outstanding, due to its similar characteristics, but nevertheless reaches an accuracy value of 83 % using a Neural Network Multi-layer Perceptron model. The great performance of the model after the training phase allowed its implementation using unlabeled tracks, achieving very good results with an accuracy of 92.6 % in the prediction of the tracks used, making only mistakes in distinguishing between tram and bus tracks.<br /

    Robust object representation by boosting-like deep learning architecture

    Get PDF
    This paper presents a new deep learning architecture for robust object representation, aiming at efficiently combining the proposed synchronized multi-stage feature (SMF) and a boosting-like algorithm. The SMF structure can capture a variety of characteristics from the inputting object based on the fusion of the handcraft features and deep learned features. With the proposed boosting-like algorithm, we can obtain more convergence stability on training multi-layer network by using the boosted samples. We show the generalization of our object representation architecture by applying it to undertake various tasks, i.e. pedestrian detection and action recognition. Our approach achieves 15.89% and 3.85% reduction in the average miss rate compared with ACF and JointDeep on the largest Caltech dataset, and acquires competitive results on the MSRAction3D dataset

    Vision for Looking at Traffic Lights:Issues, Survey, and Perspectives

    Get PDF
    corecore