277 research outputs found

    Laser Scanner: eSafety & ITS Applications

    Get PDF

    The Understanding of Human Activities by Computer Vision Techniques

    Get PDF
    Esta tesis propone nuevas metodologías para el aprendizaje de actividades humanas y su clasificación en categorías. Aunque este tema ha sido ampliamente estudiado por la comunidad investigadora en visión por computador, aún encontramos importantes dificultades por resolver. En primer lugar hemos encontrado que la literatura sobre técnicas de visión por computador para el aprendizaje de actividades humanas empleando pocas secuencias de entrenamiento es escasa y además presenta resultados pobres [1] [2]. Sin embargo, este aprendizaje es una herramienta crucial en varios escenarios. Por ejemplo, un sistema de reconocimiento recién desplegado necesita mucho tiempo para adquirir nuevas secuencias de entrenamiento así que el entrenamiento con pocos ejemplos puede acelerar la puesta en funcionamiento. También la detección de comportamientos anómalos, ejemplos de los cuales son difíciles de obtener, puede beneficiarse de estas técnicas. Existen soluciones mediante técnicas de cruce dominios o empleando características invariantes, sin embargo estas soluciones omiten información del escenario objetivo la cual reduce el ruido en el sistema mejorando los resultados cuando se tiene en cuenta y ejemplos de actividades anómalas siguen siendo difíciles de obtener. Estos sistemas entrenados con poca información se enfrentan a dos problemas principales: por una parte el sistema de entrenamiento puede sufrir de inestabilidades numéricas en la estimación de los parámetros del modelo, por otra, existe una falta de información representativa proveniente de actividades diversas. Nos hemos enfrentado a estos problemas proponiendo novedosos métodos para el aprendizaje de actividades humanas usando tan solo un ejemplo, lo que se denomina one-shot learning. Nuestras propuestas se basan en sistemas generativos, derivadas de los Modelos Ocultos de Markov[3][4], puesto que cada clase de actividad debe ser aprendida con tan solo un ejemplo. Además, hemos ampliado la diversidad de información en los modelos aplicado una transferencia de información desde fuentes externas al escenario[5]. En esta tesis se explican varias propuestas y se muestra como con ellas hemos conseguidos resultados en el estado del arte en tres bases de datos públicas [6][7][8]. La segunda dificultad a la que nos hemos enfrentado es el reconocimiento de actividades sin restricciones en el escenario. En este caso no tiene por qué coincidir el escenario de entrenamiento y el de evaluación por lo que la reducción de ruido anteriormente expuesta no es aplicable. Esto supone que se pueda emplear cualquier ejemplo etiquetado para entrenamiento independientemente del escenario de origen. Esta libertad nos permite extraer vídeos desde cualquier fuente evitando la restricción en el número de ejemplos de entrenamiento. Teniendo suficientes ejemplos de entrenamiento tanto métodos generativos como discriminativos pueden ser empleados. En el momento de realización de esta tesis encontramos que el estado del arte obtiene los mejores resultados empleando métodos discriminativos, sin embargo, la mayoría de propuestas no suelen considerar la información temporal a largo plazo de las actividades[9]. Esta información puede ser crucial para distinguir entre actividades donde el orden de sub-acciones es determinante, y puede ser una ayuda en otras situaciones[10]. Para ello hemos diseñado un sistema que incluye dicha información en una Máquina de Vectores de Soporte. Además, el sistema permite cierta flexibilidad en la alineación de las secuencias a comparar, característica muy útil si la segmentación de las actividades no es perfecta. Utilizando este sistema hemos obtenido resultados en el estado del arte para cuatro bases de datos complejas sin restricciones en los escenarios[11][12][13][14]. Los trabajos realizados en esta tesis han servido para realizar tres artículos en revistas del primer cuartil [15][16][17], dos ya publicados y otro enviado. Además, se han publicado 8 artículos en congresos internacionales y uno nacional [18][19][20][21][22][23][24][25][26]. [1]Seo, H. J. and Milanfar, P. (2011). Action recognition from one example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):867–882.(2011) [2]Yang, Y., Saleemi, I., and Shah, M. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1635–1648. (2013) [3]Rabiner, L. R. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286. (1989) [4]Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA. (2006) [5]Cook, D., Feuz, K., and Krishnan, N. Transfer learning for activity recognition: a survey. Knowledge and Information Systems, pages 1–20. (2013) [6]Schuldt, C., Laptev, I., and Caputo, B. Recognizing human actions: a local svm approach. In International Conference on Pattern Recognition (ICPR). (2004) [7]Weinland, D., Ronfard, R., and Boyer, E. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3):249–257. (2006) [8]Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247–2253. (2007) [9]Wang, H. and Schmid, C. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV). (2013) [10]Choi, J., Wang, Z., Lee, S.-C., and Jeon, W. J. A spatio-temporal pyramid matching for video retrieval. Computer Vision and Image Understanding, 117(6):660 – 669. (2013) [11]Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., Aggarwal, J. K., Lee, H., Davis, L., Swears, E., Wang, X., Ji, Q., Reddy, K., Shah, M., Vondrick, C., Pirsiavash, H., Ramanan, D., Yuen, J., Torralba, A., Song, B., Fong, A., Roy-Chowdhury, A., and Desai, M. A large-scale benchmark dataset for event recognition in surveillance video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3153–3160. (2011) [12] Niebles, J. C., Chen, C.-W., and Fei-Fei, L. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference on Computer Vision (ECCV), pages 392–405.(2010) [13]Reddy, K. K. and Shah, M. Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5):971–981. (2013) [14]Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV). (2011) [15]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. One-shot learning of human activity with an map adapted gmm and simplex-hmm. IEEE Transactions on Cybernetics, PP(99):1–12. (2016) [16]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. A time flexible kernel framework for video-based activity recognition. Image and Vision Computing 48-49:26 – 36. (2016) [17]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. Extended Study for One-shot Learning of Human Activity by a Simplex-HMM. IEEE Transactions on Cybernetics (Enviado) [18]Orrite, C., Rodriguez, M., Medrano, C. One-shot learning of temporal sequences using a distance dependent Chinese Restaurant Process. In Proceedings of the 23nd International Conference Pattern Recognition ICPR (December 2016) [19]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Spectral Clustering Using Friendship Path Similarity Proceedings of the 7th Iberian Conference, IbPRIA (June 2015) [20]Orrite, C., Soler, J., Rodriguez, M., Herrero, E., and Casas, R. Image-based location recognition and scenario modelling. In Proceedings of the 10th International Conference on Computer Vision Theory and Applications, VISAPP (March 2015) [21]Castán, D., Rodríguez, M., Ortega, A., Orrite, C., and Lleida, E. Vivolab and cvlab - mediaeval 2014: Violent scenes detection affect task. In Working Notes Proceedings of the MediaEval (October 2014) [22]Orrite, C., Rodriguez, M., Herrero, E., Rogez, G., and Velastin, S. A. Automatic segmentation and recognition of human actions in monocular sequences In Proceedings of the 22nd International Conference Pattern Recognition ICPR (August 2014) [23]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Transfer learning of human poses for action recognition. In 4th International Workshop of Human Behavior Unterstanding (HBU). (October 2013) [24]Rodriguez, M., Orrite, C., and Medrano, C. Human action recognition with limited labelled data. In Actas del III Workshop de Reconocimiento de Formas y Analisis de Imagenes, WSRFAI. (September 2013) [25]Orrite, C., Monforte, P., Rodriguez, M., and Herrero, E. Human Action Recognition under Partial Occlusions . Proceedings of the 6th Iberian Conference, IbPRIA (June 2013) [26]Orrite, C., Rodriguez, M., and Montañes, M. One sequence learning of human actions. In 2nd International Workshop of Human Behavior Unterstanding (HBU). (November 2011)This thesis provides some novel frameworks for learning human activities and for further classifying them into categories. This field of research has been largely studied by the computer vision community however there are still many drawbacks to solve. First, we have found few proposals in the literature for learning human activities from limited number of sequences. However, this learning is critical in several scenarios. For instance, in the initial stage after a system installation the capture of activity examples is time expensive and therefore, the learning with limited examples may accelerate the operational launch of the system. Moreover, examples for training abnormal behaviour are hardly obtainable and their learning may benefit from the same techniques. This problem is solved by some approaches, such as cross domain implementations or the use of invariant features, but they do not consider the specific scenario information which is useful for reducing the clutter and improving the results. Systems trained with scarce information face two main problems: on the one hand, the training process may suffer from numerical instabilities while estimating the model parameters; on the other hand, the model lacks of representative information coming from a diverse set of activity classes. We have dealt with these problems providing some novel approaches for learning human activities from one example, what is called a one-shot learning method. To do so, we have proposed generative approaches based on Hidden Markov Models as we need to learn each activity class from only one example. In addition, we have transferred information from external sources in order to introduce diverse information into the model. This thesis explains our proposals and shows how these methods achieve state-of-the-art results in three public datasets. Second, we have studied the recognition of human activities in unconstrained scenarios. In this case, the scenario may or may not be repeated in training and evaluation and therefore the clutter reduction previously mentioned does not happen. On the other hand, we can use any labelled video for training the system independently of the target scenario. This freedom allows the extraction of videos from the Internet dismissing the implicit constrains when training with limited examples. Having plenty of training examples both, generative and discriminative, methods can be used and by the time this thesis has been made the state-of-the-art has been achieved by discriminative ones. However, most of the methods usually fail when taking into consideration long-term information of the activities. This information is critical when comparing activities where the order of sub-actions is important, and may be useful in other comparisons as well. Thus, we have designed a framework that incorporates this information in a discriminative classifier. In addition, this method introduces some flexibility for sequence alignment, useful feature when the activity segmentation is not exact. Using this framework we have obtained state-of-the-art results in four challenging public datasets with unconstrained scenarios

    Interactive models for latent information discovery in satellite images

    Get PDF
    The recent increase in Earth Observation (EO) missions has resulted in unprecedented volumes of multi-modal data to be processed, understood, used and stored in archives. The advanced capabilities of satellite sensors become useful only when translated into accurate, focused information, ready to be used by decision makers from various fields. Two key problems emerge when trying to bridge the gap between research, science and multi-user platforms: (1) The current systems for data access permit only queries by geographic location, time of acquisition, type of sensor, but this information is often less important than the latent, conceptual content of the scenes; (2) simultaneously, many new applications relying on EO data require the knowledge of complex image processing and computer vision methods for understanding and extracting information from the data. This dissertation designs two important concept modules of a theoretical image information mining (IIM) system for EO: semantic knowledge discovery in large databases and data visualization techniques. These modules allow users to discover and extract relevant conceptual information directly from satellite images and generate an optimum visualization for this information. The first contribution of this dissertation brings a theoretical solution that bridges the gap and discovers the semantic rules between the output of state-of-the-art classification algorithms and the semantic, human-defined, manually-applied terminology of cartographic data. The set of rules explain in latent, linguistic concepts the contents of satellite images and link the low-level machine language to the high-level human understanding. The second contribution of this dissertation is an adaptive visualization methodology used to assist the image analyst in understanding the satellite image through optimum representations and to offer cognitive support in discovering relevant information in the scenes. It is an interactive technique applied to discover the optimum combination of three spectral features of a multi-band satellite image that enhance visualization of learned targets and phenomena of interest. The visual mining module is essential for an IIM system because all EO-based applications involve several steps of visual inspection and the final decision about the information derived from satellite data is always made by a human operator. To ensure maximum correlation between the requirements of the analyst and the possibilities of the computer, the visualization tool models the human visual system and secures that a change in the image space is equivalent to a change in the perception space of the operator. This thesis presents novel concepts and methods that help users access and discover latent information in archives and visualize satellite scenes in an interactive, human-centered and information-driven workflow.Der aktuelle Anstieg an Erdbeobachtungsmissionen hat zu einem Anstieg von multi-modalen Daten geführt die verarbeitet, verstanden, benutzt und in Archiven gespeichert werden müssen. Die erweiterten Fähigkeiten von Satellitensensoren sind nur dann von Entscheidungstraegern nutzbar, wenn sie in genaue, fokussierte Information liefern. Es bestehen zwei Schlüsselprobleme beim Versuch die Lücke zwischen Forschung, Wissenschaft und Multi-User-Systeme zu füllen: (1) Die aktuellen Systeme für Datenzugriffe erlauben nur Anfragen basierend auf geografischer Position, Aufzeichnungszeit, Sensortyp. Aber diese Informationen sind oft weniger wichtig als der latente, konzeptuelle Inhalt der Szenerien. (2) Viele neue Anwendungen von Erdbeobachtungsdaten benötigen Wissen über komplexe Bildverarbeitung und Computer Vision Methoden um Information verstehen und extrahieren zu können. Diese Dissertation zeigt zwei wichtige Konzeptmodule eines theoretischen Image Information Mining (IIM) Systems für Erdbeobachtung auf: Semantische Informationsentdeckung in grossen Datenbanken und Datenvisualisierungstechniken. Diese Module erlauben Benutzern das Entdecken und Extrahieren relevanter konzeptioneller Informationen direkt aus Satellitendaten und die Erzeugung von optimalen Visualisierungen dieser Informationen. Der erste Beitrag dieser Dissertation bringt eine theretische Lösung welche diese Lücke überbrückt und entdeckt semantische Regeln zwischen dem Output von state-of-the-art Klassifikationsalgorithmen und semantischer, menschlich definierter, manuell angewendete Terminologie von kartographischen Daten. Ein Satz von Regeln erkläret in latenten, linguistischen Konzepten den Inhalte von Satellitenbildern und verbinden die low-level Maschinensprache mit high-level menschlichen Verstehen. Der zweite Beitrag dieser Dissertation ist eine adaptive Visualisierungsmethode die einem Bildanalysten im Verstehen der Satellitenbilder durch optimale Repräsentation hilft und die kognitive Unterstützung beim Entdecken von relevenanter Informationen in Szenerien bietet. Die Methode ist ein interaktive Technik die angewendet wird um eine optimale Kombination von von drei Spektralfeatures eines Multiband-Satellitenbildes welche die Visualisierung von gelernten Zielen and Phänomenen ermöglichen. Das visuelle Mining-Modul ist essentiell für IIM Systeme da alle erdbeobachtungsbasierte Anwendungen mehrere Schritte von visueller Inspektion benötigen und davon abgeleitete Informationen immer vom Operator selbst gemacht werden müssen. Um eine maximale Korrelation von Anforderungen des Analysten und den Möglichkeiten von Computern sicher zu stellen, modelliert das Visualisierungsmodul das menschliche Wahrnehmungssystem und stellt weiters sicher, dass eine Änderung im Bildraum äquivalent zu einer Änderung der Wahrnehmung durch den Operator ist. Diese These präsentieret neuartige Konzepte und Methoden, die Anwendern helfen latente Informationen in Archiven zu finden und visualisiert Satellitenszenen in einem interaktiven, menschlich zentrierten und informationsgetriebenen Arbeitsprozess

    Obstacle and Change Detection Using Monocular Vision

    Get PDF
    We explore change detection using videos of change-free paths to detect any changes that occur while travelling the same paths in the future. This approach benefits from learning the background model of the given path as preprocessing, detecting changes starting from the first frame, and determining the current location in the path. Two approaches are explored: a geometry-based approach and a deep learning approach. In our geometry-based approach, we use feature points to match testing frames to training frames. Matched frames are used to determine the current location within the training video. The frames are then processed by first registering the test frame onto the training frame through a homography of the previously matched feature points. Finally, a comparison is made to determine changes by using a region of interest (ROI) of the direct path of the robot in both frames. This approach performs well in many tests with various floor patterns, textures and complexities in the background of the path. In our deep learning approach, we use an ensemble of unsupervised dimensionality reduction models. We first extract feature points within a ROI and extract small frame samples around the feature points. The frame samples are used as training inputs and labels for our unsupervised models. The approach aims at learning a compressed feature representation of the frame samples in order to have a compact representation of background. We use the distribution of the training samples to directly compare the learned background to test samples with a classification of background or change using a majority vote. This approach performs well using just two models in the ensemble and achieves an overall accuracy of 98.0% with a 4.1% improvement over the geometry-based approach

    Motion-capture-based hand gesture recognition for computing and control

    Get PDF
    This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training. In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across multiple users but relatively limited in maximum achievable accuracy. Certain classifiers based upon the pseudo-rasterization performed best among tested classification algorithms. We follow this study with targeted examinations of certain subproblems. For the first subproblem, we introduce the a fortiori expectation-maximization (AFEM) algorithm for computing the parameters of a distribution from which unlabeled, correlated point sets are presumed to be generated. Each unlabeled point is assumed to correspond to a target with independent probability of appearance but correlated positions. We propose replacing the expectation phase of the algorithm with a Kalman filter modified within a Bayesian framework to account for the unknown point labels which manifest as uncertain measurement matrices. We also propose a mechanism to reorder the measurements in order to improve parameter estimates. In addition, we use a state-of-the-art Markov chain Monte Carlo sampler to efficiently sample measurement matrices. In the process, we indirectly propose a constrained k-means clustering algorithm. Simulations verify the utility of AFEM against a traditional expectation-maximization algorithm in a variety of scenarios. In the second subproblem, we consider the application of positive definite kernels and the earth mover\u27s distance (END) to our work. Positive definite kernels are an important tool in machine learning that enable efficient solutions to otherwise difficult or intractable problems by implicitly linearizing the problem geometry. We develop a set-theoretic interpretation of ENID and propose earth mover\u27s intersection (EMI). a positive definite analog to ENID. We offer proof of EMD\u27s negative definiteness and provide necessary and sufficient conditions for ENID to be conditionally negative definite, including approximations that guarantee negative definiteness. In particular, we show that ENID is related to various min-like kernels. We also present a positive definite preserving transformation that can be applied to any kernel and can be used to derive positive definite EMD-based kernels, and we show that the Jaccard index is simply the result of this transformation applied to set intersection. Finally, we evaluate kernels based on EMI and the proposed transformation versus ENID in various computer vision tasks and show that END is generally inferior even with indefinite kernel techniques. Finally, we apply deep learning to our problem. We propose neural network architectures for hand posture and gesture recognition from unlabeled marker sets in a coordinate system local to the hand. As a means of ensuring data integrity, we also propose an extended Kalman filter for tracking the rigid pattern of markers on which the local coordinate system is based. We consider fixed- and variable-size architectures including convolutional and recurrent neural networks that accept unlabeled marker input. We also consider a data-driven approach to labeling markers with a neural network and a collection of Kalman filters. Experimental evaluations with posture and gesture datasets show promising results for the proposed architectures with unlabeled markers, which outperform the alternative data-driven labeling method

    Radial Basis Function Neural Network in Identifying The Types of Mangoes

    Get PDF
    Mango (Mangifera Indica L) is part of a fruit plant species that have different color and texture characteristics to indicate its type. The identification of the types of mangoes uses the manual method through direct visual observation of mangoes to be classified. At the same time, the more subjective way humans work causes differences in their determination. Therefore in the use of information technology, it is possible to classify mangoes based on their texture using a computerized system. In its completion, the acquisition process is using the camera as an image processing instrument of the recorded images. To determine the pattern of mango data taken from several samples of texture features using Gabor filters from various types of mangoes and the value of the feature extraction results through artificial neural networks (ANN). Using the Radial Base Function method, which produces weight values, is then used as a process for classifying types of mangoes. The accuracy of the test results obtained from the use of extraction methods and existing learning methods is 100%

    Deep Learning Detected Nutrient Deficiency in Chili Plant

    Get PDF
    Chili is a staple commodity that also affects the Indonesian economy due to high market demand. Proven in June 2019, chili is a contributor to Indonesia's inflation of 0.20% from 0.55%. One factor is crop failure due to malnutrition. In this study, the aim is to explore Deep Learning Technology in agriculture to help farmers be able to diagnose their plants, so that their plants are not malnourished. Using the RCNN algorithm as the architecture of this system. Use 270 datasets in 4 categories. The dataset used is primary data with chili samples in Boyolali Regency, Indonesia. The chili we use are curly chili. The results of this study are computers that can recognize nutrient deficiencies in chili plants based on image input received with the greatest testing accuracy of 82.61% and has the best mAP value of 15.57%
    corecore