532 research outputs found

    Computational Learning for Hand Pose Estimation

    Get PDF
    Rapid advances in human–computer interaction interfaces have been promising a realistic environment for gaming and entertainment in the last few years. However, the use of traditional input devices such as trackballs, keyboards, or joysticks has been a bottleneck for natural interactions between a human and computer as two points of freedom of these devices cannot suitably emulate the interactions in a three-dimensional space. Consequently, a comprehensive hand tracking technology is expected as a smart and intuitive option to these input tools to enhance virtual and augmented reality experiences. In addition, the recent emergence of low-cost depth sensing cameras has led to their broad use of RGB-D data in computer vision, raising expectations of a full 3D interpretation of hand movements for human–computer interaction interfaces. Although the use of hand gestures or hand postures has become essential for a wide range of applications in computer games and augmented/virtual reality, 3D hand pose estimation is still an open and challenging problem because of the following reasons: (i) the hand pose exists in a high-dimensional space because each finger and the palm is associated with several degrees of freedom, (ii) the fingers exhibit self-similarity and often occlude to each other, (iii) global 3D rotations make pose estimation more difficult, and (iv) hands only exist in few pixels in images and the noise in acquired data coupled with fast finger movement confounds continuous hand tracking. The success of hand tracking would naturally depend on synthesizing our knowledge of the hand (i.e., geometric shape, constraints on pose configurations) and latent features about hand poses from the RGB-D data stream (i.e., region of interest, key feature points like finger tips and joints, and temporal continuity). In this thesis, we propose novel methods to leverage the paradigm of analysis by synthesis and create a prediction model using a population of realistic 3D hand poses. The overall goal of this work is to design a concrete framework so the computers can learn and understand about perceptual attributes of human hands (i.e., self-occlusions or self-similarities of the fingers) and to develop a pragmatic solution to the real-time hand pose estimation problem implementable on a standard computer. This thesis can be broadly divided into four parts: learning hand (i) from recommendiations of similar hand poses, (ii) from low-dimensional visual representations, (iii) by hallucinating geometric representations, and (iv) from a manipulating object. Each research work covers our algorithmic contributions to solve the 3D hand pose estimation problem. Additionally, the research work in the appendix proposes a pragmatic technique for applying our ideas to mobile devices with low computational power. Following a given structure, we first overview the most relevant works on depth sensor-based 3D hand pose estimation in the literature both with and without manipulating an object. Two different approaches prevalent for categorizing hand pose estimation, model-based methods and appearance-based methods, are discussed in detail. In this chapter, we also introduce some works relevant to deep learning and trials to achieve efficient compression of the network structure. Next, we describe a synthetic 3D hand model and its motion constraints for simulating realistic human hand movements. The section for the primary research work starts in the following chapter. We discuss our attempts to produce a better estimation model for 3D hand pose estimation by learning hand articulations from recommendations of similar poses. Specifically, the unknown pose parameters for input depth data are estimated by collaboratively learning the known parameters of all neighborhood poses. Subsequently, we discuss deep-learned, discriminative, and low-dimensional features and a hierarchical solution of the stated problem based on the matrix completion framework. This work is further extended by incorporating a function of geometric properties on the surface of the hand described by heat diffusion, which is robust to capture both the local geometry of the hand and global structural representations. The problem of the hands interactions with a physical object is also considered in the following chapter. The main insight is that the interacting object can be a source of constraint on hand poses. In this view, we employ pose dependency on the shape of the object to learn the discriminative features of the hand–object interaction, rather than losing hand information caused by partial or full object occlusions. Subsequently, we present a compressive learning technique in the appendix. Our approach is flexible, enabling us to add more layers and go deeper in the deep learning architecture while keeping the number of parameters the same. Finally, we conclude this thesis work by summarizing the presented approaches for hand pose estimation and then propose future directions to further achieve performance improvements through (i) realistically rendered synthetic hand images, (ii) incorporating RGB images as an input, (iii) hand perseonalization, (iv) use of unstructured point cloud, and (v) embedding sensing techniques

    Classificação de nódulos pulmonares baseada em redes neurais convolucionais profundas em radiografias

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O câncer de pulmão, que se caracteriza pela presença de nódulos, é o tipo mais comum de câncer em todo o mundo, além de ser um dos mais agressivos e fatais, com 20% da mortalidade total por câncer. A triagem do câncer de pulmão pode ser realizada por radiologistas que analisam imagens de raios-X de tórax (CXR). No entanto, a detecção de nódulos pulmonares é uma tarefa difícil devido a sua grande variabilidade, limitações humanas de memória, distração e fadiga, entre outros fatores. Essas dificuldades motivam o desenvolvimento de sistemas de diagnóstico por computador (CAD) para apoiar radiologistas na detecção de nódulos pulmonares. A classificação do nódulo do pulmão é um dos principais tópicos relacionados aos sistemas de CAD. Embora as redes neurais convolucionais (CNN) tenham demonstrado um bom desempenho em muitas tarefas, há poucas explorações de seu uso para classificar nódulos pulmonares em imagens CXR. Neste trabalho, propusemos e analisamos um arcabouço para a detecção de nódulos pulmonares em imagens de CXR que inclui segmentação da área pulmonar, localização de nódulos e classificação de nódulos candidatos. Apresentamos um método para classificação de nódulos candidatos com CNNs treinadas a partir do zero. A eficácia do nosso método baseia-se na seleção de parâmetros de aumento de dados, no projeto de uma arquitetura CNN especializada, no uso da regularização de dropout na rede, inclusive em camadas convolucionais, e no tratamento da falta de amostras de nódulos em comparação com amostras de fundo, balanceando mini-lotes em cada iteração da descida do gradiente estocástico. Todas as decisões de seleção do modelo foram tomadas usando-se um subconjunto de imagens CXR da base Lung Image Database Consortium and Image Database Resource Initiative (LIDC/IDRI) separadamente. Então, utilizamos todas as imagens com nódulos no conjunto de dados da Japanese Society of Radiological Technology (JSRT) para avaliação. Nossos experimentos mostraram que as CNNs foram capazes de alcançar resultados competitivos quando comparados com métodos da literatura. Nossa proposta obteve uma curva de operação (AUC) de 7.51 considerando 10 falsos positivos por imagem (FPPI) e uma sensibilidade de 71.4% e 81.0% com 2 e 5 FPPI, respectivamenteAbstract: Lung cancer, which is characterized by the presence of nodules, is the most common type of cancer around the world, as well as one of the most aggressive and deadliest cancer, with 20% of total cancer mortality. Lung cancer screening can be performed by radiologists analyzing chest X-ray (CXR) images. However, the detection of lung nodules is a difficult task due to their wide variability, human limitations of memory, distraction and fatigue, among other factors. These difficulties motivate the development of computer-aided diagnosis (CAD) systems for supporting radiologists in detecting lung nodules. Lung nodule classification is one of the main topics related to CAD systems. Although convolutional neural networks (CNN) have been demonstrated to perform well on many tasks, there are few explorations of their use for classifying lung nodules in CXR images. In this work, we proposed and analyzed a pipeline for detecting lung nodules in CXR images that includes lung area segmentation, potential nodule localization, and nodule candidate classification. We presented a method for classifying nodule candidates with a CNN trained from the scratch. The effectiveness of our method relies on the selection of data augmentation parameters, the design of a specialized CNN architecture, the use of dropout regularization on the network, inclusive in convolutional layers, and addressing the lack of nodule samples compared to background samples balancing mini-batches on each stochastic gradient descent iteration. All model selection decisions were taken using a CXR subset of the Lung Image Database Consortium and Image Database Resource Initiative (LIDC/IDRI) dataset separately. Thus, we used all images with nodules in the Japanese Society of Radiological Technology (JSRT) dataset for evaluation. Our experiments showed that CNNs were capable of achieving competitive results when compared to state-of-the-art methods. Our proposal obtained an area under the free-response receiver operating characteristic (AUC) curve of 7.51 considering 10 false positives per image (FPPI), and a sensitivity of 71.4% and 81.0% with 2 and 5 FPPI, respectivelyMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    The Understanding of Human Activities by Computer Vision Techniques

    Get PDF
    Esta tesis propone nuevas metodologías para el aprendizaje de actividades humanas y su clasificación en categorías. Aunque este tema ha sido ampliamente estudiado por la comunidad investigadora en visión por computador, aún encontramos importantes dificultades por resolver. En primer lugar hemos encontrado que la literatura sobre técnicas de visión por computador para el aprendizaje de actividades humanas empleando pocas secuencias de entrenamiento es escasa y además presenta resultados pobres [1] [2]. Sin embargo, este aprendizaje es una herramienta crucial en varios escenarios. Por ejemplo, un sistema de reconocimiento recién desplegado necesita mucho tiempo para adquirir nuevas secuencias de entrenamiento así que el entrenamiento con pocos ejemplos puede acelerar la puesta en funcionamiento. También la detección de comportamientos anómalos, ejemplos de los cuales son difíciles de obtener, puede beneficiarse de estas técnicas. Existen soluciones mediante técnicas de cruce dominios o empleando características invariantes, sin embargo estas soluciones omiten información del escenario objetivo la cual reduce el ruido en el sistema mejorando los resultados cuando se tiene en cuenta y ejemplos de actividades anómalas siguen siendo difíciles de obtener. Estos sistemas entrenados con poca información se enfrentan a dos problemas principales: por una parte el sistema de entrenamiento puede sufrir de inestabilidades numéricas en la estimación de los parámetros del modelo, por otra, existe una falta de información representativa proveniente de actividades diversas. Nos hemos enfrentado a estos problemas proponiendo novedosos métodos para el aprendizaje de actividades humanas usando tan solo un ejemplo, lo que se denomina one-shot learning. Nuestras propuestas se basan en sistemas generativos, derivadas de los Modelos Ocultos de Markov[3][4], puesto que cada clase de actividad debe ser aprendida con tan solo un ejemplo. Además, hemos ampliado la diversidad de información en los modelos aplicado una transferencia de información desde fuentes externas al escenario[5]. En esta tesis se explican varias propuestas y se muestra como con ellas hemos conseguidos resultados en el estado del arte en tres bases de datos públicas [6][7][8]. La segunda dificultad a la que nos hemos enfrentado es el reconocimiento de actividades sin restricciones en el escenario. En este caso no tiene por qué coincidir el escenario de entrenamiento y el de evaluación por lo que la reducción de ruido anteriormente expuesta no es aplicable. Esto supone que se pueda emplear cualquier ejemplo etiquetado para entrenamiento independientemente del escenario de origen. Esta libertad nos permite extraer vídeos desde cualquier fuente evitando la restricción en el número de ejemplos de entrenamiento. Teniendo suficientes ejemplos de entrenamiento tanto métodos generativos como discriminativos pueden ser empleados. En el momento de realización de esta tesis encontramos que el estado del arte obtiene los mejores resultados empleando métodos discriminativos, sin embargo, la mayoría de propuestas no suelen considerar la información temporal a largo plazo de las actividades[9]. Esta información puede ser crucial para distinguir entre actividades donde el orden de sub-acciones es determinante, y puede ser una ayuda en otras situaciones[10]. Para ello hemos diseñado un sistema que incluye dicha información en una Máquina de Vectores de Soporte. Además, el sistema permite cierta flexibilidad en la alineación de las secuencias a comparar, característica muy útil si la segmentación de las actividades no es perfecta. Utilizando este sistema hemos obtenido resultados en el estado del arte para cuatro bases de datos complejas sin restricciones en los escenarios[11][12][13][14]. Los trabajos realizados en esta tesis han servido para realizar tres artículos en revistas del primer cuartil [15][16][17], dos ya publicados y otro enviado. Además, se han publicado 8 artículos en congresos internacionales y uno nacional [18][19][20][21][22][23][24][25][26]. [1]Seo, H. J. and Milanfar, P. (2011). Action recognition from one example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):867–882.(2011) [2]Yang, Y., Saleemi, I., and Shah, M. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1635–1648. (2013) [3]Rabiner, L. R. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286. (1989) [4]Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA. (2006) [5]Cook, D., Feuz, K., and Krishnan, N. Transfer learning for activity recognition: a survey. Knowledge and Information Systems, pages 1–20. (2013) [6]Schuldt, C., Laptev, I., and Caputo, B. Recognizing human actions: a local svm approach. In International Conference on Pattern Recognition (ICPR). (2004) [7]Weinland, D., Ronfard, R., and Boyer, E. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3):249–257. (2006) [8]Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247–2253. (2007) [9]Wang, H. and Schmid, C. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV). (2013) [10]Choi, J., Wang, Z., Lee, S.-C., and Jeon, W. J. A spatio-temporal pyramid matching for video retrieval. Computer Vision and Image Understanding, 117(6):660 – 669. (2013) [11]Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., Aggarwal, J. K., Lee, H., Davis, L., Swears, E., Wang, X., Ji, Q., Reddy, K., Shah, M., Vondrick, C., Pirsiavash, H., Ramanan, D., Yuen, J., Torralba, A., Song, B., Fong, A., Roy-Chowdhury, A., and Desai, M. A large-scale benchmark dataset for event recognition in surveillance video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3153–3160. (2011) [12] Niebles, J. C., Chen, C.-W., and Fei-Fei, L. Modeling temporal structure of decomposable motion segments for activity classification. In European Conference on Computer Vision (ECCV), pages 392–405.(2010) [13]Reddy, K. K. and Shah, M. Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24(5):971–981. (2013) [14]Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV). (2011) [15]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. One-shot learning of human activity with an map adapted gmm and simplex-hmm. IEEE Transactions on Cybernetics, PP(99):1–12. (2016) [16]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. A time flexible kernel framework for video-based activity recognition. Image and Vision Computing 48-49:26 – 36. (2016) [17]Rodriguez, M., Orrite, C., Medrano, C., and Makris, D. Extended Study for One-shot Learning of Human Activity by a Simplex-HMM. IEEE Transactions on Cybernetics (Enviado) [18]Orrite, C., Rodriguez, M., Medrano, C. One-shot learning of temporal sequences using a distance dependent Chinese Restaurant Process. In Proceedings of the 23nd International Conference Pattern Recognition ICPR (December 2016) [19]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Spectral Clustering Using Friendship Path Similarity Proceedings of the 7th Iberian Conference, IbPRIA (June 2015) [20]Orrite, C., Soler, J., Rodriguez, M., Herrero, E., and Casas, R. Image-based location recognition and scenario modelling. In Proceedings of the 10th International Conference on Computer Vision Theory and Applications, VISAPP (March 2015) [21]Castán, D., Rodríguez, M., Ortega, A., Orrite, C., and Lleida, E. Vivolab and cvlab - mediaeval 2014: Violent scenes detection affect task. In Working Notes Proceedings of the MediaEval (October 2014) [22]Orrite, C., Rodriguez, M., Herrero, E., Rogez, G., and Velastin, S. A. Automatic segmentation and recognition of human actions in monocular sequences In Proceedings of the 22nd International Conference Pattern Recognition ICPR (August 2014) [23]Rodriguez, M., Medrano, C., Herrero, E., and Orrite, C. Transfer learning of human poses for action recognition. In 4th International Workshop of Human Behavior Unterstanding (HBU). (October 2013) [24]Rodriguez, M., Orrite, C., and Medrano, C. Human action recognition with limited labelled data. In Actas del III Workshop de Reconocimiento de Formas y Analisis de Imagenes, WSRFAI. (September 2013) [25]Orrite, C., Monforte, P., Rodriguez, M., and Herrero, E. Human Action Recognition under Partial Occlusions . Proceedings of the 6th Iberian Conference, IbPRIA (June 2013) [26]Orrite, C., Rodriguez, M., and Montañes, M. One sequence learning of human actions. In 2nd International Workshop of Human Behavior Unterstanding (HBU). (November 2011)This thesis provides some novel frameworks for learning human activities and for further classifying them into categories. This field of research has been largely studied by the computer vision community however there are still many drawbacks to solve. First, we have found few proposals in the literature for learning human activities from limited number of sequences. However, this learning is critical in several scenarios. For instance, in the initial stage after a system installation the capture of activity examples is time expensive and therefore, the learning with limited examples may accelerate the operational launch of the system. Moreover, examples for training abnormal behaviour are hardly obtainable and their learning may benefit from the same techniques. This problem is solved by some approaches, such as cross domain implementations or the use of invariant features, but they do not consider the specific scenario information which is useful for reducing the clutter and improving the results. Systems trained with scarce information face two main problems: on the one hand, the training process may suffer from numerical instabilities while estimating the model parameters; on the other hand, the model lacks of representative information coming from a diverse set of activity classes. We have dealt with these problems providing some novel approaches for learning human activities from one example, what is called a one-shot learning method. To do so, we have proposed generative approaches based on Hidden Markov Models as we need to learn each activity class from only one example. In addition, we have transferred information from external sources in order to introduce diverse information into the model. This thesis explains our proposals and shows how these methods achieve state-of-the-art results in three public datasets. Second, we have studied the recognition of human activities in unconstrained scenarios. In this case, the scenario may or may not be repeated in training and evaluation and therefore the clutter reduction previously mentioned does not happen. On the other hand, we can use any labelled video for training the system independently of the target scenario. This freedom allows the extraction of videos from the Internet dismissing the implicit constrains when training with limited examples. Having plenty of training examples both, generative and discriminative, methods can be used and by the time this thesis has been made the state-of-the-art has been achieved by discriminative ones. However, most of the methods usually fail when taking into consideration long-term information of the activities. This information is critical when comparing activities where the order of sub-actions is important, and may be useful in other comparisons as well. Thus, we have designed a framework that incorporates this information in a discriminative classifier. In addition, this method introduces some flexibility for sequence alignment, useful feature when the activity segmentation is not exact. Using this framework we have obtained state-of-the-art results in four challenging public datasets with unconstrained scenarios

    Similarity reasoning for local surface analysis and recognition

    Get PDF
    This thesis addresses the similarity assessment of digital shapes, contributing to the analysis of surface characteristics that are independent of the global shape but are crucial to identify a model as belonging to the same manufacture, the same origin/culture or the same typology (color, common decorations, common feature elements, compatible style elements, etc.). To face this problem, the interpretation of the local surface properties is crucial. We go beyond the retrieval of models or surface patches in a collection of models, facing the recognition of geometric patterns across digital models with different overall shape. To address this challenging problem, the use of both engineered and learning-based descriptions are investigated, building one of the first contributions towards the localization and identification of geometric patterns on digital surfaces. Finally, the recognition of patterns adds a further perspective in the exploration of (large) 3D data collections, especially in the cultural heritage domain. Our work contributes to the definition of methods able to locally characterize the geometric and colorimetric surface decorations. Moreover, we showcase our benchmarking activity carried out in recent years on the identification of geometric features and the retrieval of digital models completely characterized by geometric or colorimetric patterns

    Mapping and Localization in Urban Environments Using Cameras

    Get PDF
    In this work we present a system to fully automatically create a highly accurate visual feature map from image data aquired from within a moving vehicle. Moreover, a system for high precision self localization is presented. Furthermore, we present a method to automatically learn a visual descriptor. The map relative self localization is centimeter accurate and allows autonomous driving
    • …
    corecore