296 research outputs found

    Learning error-correcting representations for multi-class problems

    Get PDF
    [eng] Real life is full of multi-class decision tasks. In the Pattern Recognition field, several method- ologies have been proposed to deal with binary problems obtaining satisfying results in terms of performance. However, the extension of very powerful binary classifiers to the multi-class case is a complex task. The Error-Correcting Output Codes framework has demonstrated to be a very powerful tool to combine binary classifiers to tackle multi-class problems. However, most of the combinations of binary classifiers in the ECOC framework overlook the underlay- ing structure of the multi-class problem. In addition, is still unclear how the Error-Correction of an ECOC design is distributed among the different classes. In this dissertation, we are interested in tackling critic problems of the ECOC framework, such as the definition of the number of classifiers to tackle a multi-class problem, how to adapt the ECOC coding to multi-class data and how to distribute error-correction among different pairs of categories. In order to deal with this issues, this dissertation describes several proposals. 1) We define a new representation for ECOC coding matrices that expresses the pair-wise codeword separability and allows for a deeper understanding of how error-correction is distributed among classes. 2) We study the effect of using a logarithmic number of binary classifiers to treat the multi-class problem in order to obtain very efficient models. 3) In order to search for very compact ECOC coding matrices that take into account the distribution of multi-class data we use Genetic Algorithms that take into account the constraints of the ECOC framework. 4) We propose a discrete factorization algorithm that finds an ECOC configuration that allocates the error-correcting capabilities to those classes that are more prone to errors. The proposed methodologies are evaluated on different real and synthetic data sets: UCI Machine Learning Repository, handwriting symbols, traffic signs from a Mobile Mapping System, and Human Pose Recovery. The results of this thesis show that significant perfor- mance improvements are obtained on traditional coding ECOC designs when the proposed ECOC coding designs are taken into account. [[spa] En la vida cotidiana las tareas de decisión multi-clase surgen constantemente. En el campo de Reconocimiento de Patrones muchos métodos de clasificación binaria han sido propuestos obteniendo resultados altamente satisfactorios en términos de rendimiento. Sin embargo, la extensión de estos sofisticados clasificadores binarios al contexto multi-clase es una tarea compleja. En este ámbito, las estrategias de Códigos Correctores de Errores (CCEs) han demostrado ser una herramienta muy potente para tratar la combinación de clasificadores binarios. No obstante, la mayoría de arquitecturas de combinación de clasificadores binarios negligen la estructura del problema multi-clase. Sin embargo, el análisis de la distribución de corrección de errores entre clases es aún un problema abierto. En esta tesis doctoral, nos centramos en tratar problemas críticos de los códigos correctores de errores; la definición del número de clasificadores necesarios para tratar un problema multi-clase arbitrario; la adaptación de los problemas binarios al problema multi-clase y cómo distribuir la corrección de errores entre clases. Para dar respuesta a estas cuestiones, en esta tesis doctoral describimos varias propuestas. 1) Definimos una nueva representación para CCEs que expresa la separabilidad entre pares de códigos y nos permite una mejor comprensión de cómo se distribuye la corrección de errores entre distintas clases. 2) Estudiamos el efecto de usar un número logarítmico de clasificadores binarios para tratar el problema multi-clase con el objetivo de obtener modelos muy eficientes. 3) Con el objetivo de encontrar modelos muy eficientes que tienen en cuenta la estructura del problema multi-clase utilizamos algoritmos genéticos que tienen en cuenta las restricciones de los ECCs. 4) Pro- ponemos un algoritmo de factorización de matrices discreta que encuentra ECCs con una configuración que distribuye corrección de error a aquellas categorías que son más propensas a tener errores. Las metodologías propuestas son evaluadas en distintos problemas reales y sintéticos como por ejemplo: Repositorio UCI de Aprendizaje Automático, reconocimiento de símbolos escritos, clasificación de señales de tráfico y reconocimiento de la pose humana. Los resultados obtenidos en esta tesis muestran mejoras significativas en rendimiento comparados con los diseños tradiciones de ECCs cuando las distintas propuestas se tienen en cuenta

    Generalized Stacked Sequential Learning

    Get PDF
    [eng] Over the past few decades, machine learning (ML) algorithms have become a very useful tool in tasks where designing and programming explicit, rule-based algorithms are infeasible. Some examples of applications where machine learning has been applied successfully are spam filtering, optical character recognition (OCR), search engines and computer vision. One of the most common tasks in ML is supervised learning, where the goal is to learn a general model able to predict the correct label of unseen examples from a set of known labeled input data. In supervised learning often it is assumed that data is independent and identically distributed (i.i.d ). This means that each sample in the data set has the same probability distribution as the others and all are mutually independent. However, classification problems in real world databases can break this i.i.d. assumption. For example, consider the case of object recognition in image understanding. In this case, if one pixel belongs to a certain object category, it is very likely that neighboring pixels also belong to the same object, with the exception of the borders. Another example is the case of a laughter detection application from voice records. A laugh has a clear pattern alternating voice and non-voice segments. Thus, discriminant information comes from the alternating pattern, and not just by the samples on their own. Another example can be found in the case of signature section recognition in an e-mail. In this case, the signature is usually found at the end of the mail, thus important discriminant information is found in the context. Another case is part-of-speech tagging in which each example describes a word that is categorized as noun, verb, adjective, etc. In this case it is very unlikely that patterns such as [verb, verb, adjective, verb] occur. All these applications present a common feature: the sequence/context of the labels matters. Sequential learning (25) breaks the i.i.d. assumption and assumes that samples are not independently drawn from a joint distribution of the data samples X and their labels Y . In sequential learning the training data actually consists of sequences of pairs (x, y), so that neighboring examples exhibit some kind of correlation. Usually sequential learning applications consider one-dimensional relationship support, but these types of relationships appear very frequently in other domains, such as images, or video. Sequential learning should not be confused with time series prediction. The main difference between both problems lays in the fact that sequential learning has access to the whole data set before any prediction is made and the full set of labels is to be provided at the same time. On the other hand, time series prediction has access to real labels up to the current time t and the goal is to predict the label at t + 1. Another related but different problem is sequence classification. In this case, the problem is to predict a single label for an input sequence. If we consider the image domain, the sequential learning goal is to classify the pixels of the image taking into account their context, while sequence classification is equivalent to classify one full image as one class. Sequential learning has been addressed from different perspectives: from the point of view of meta-learning by means of sliding window techniques, recurrent sliding windows or stacked sequential learning where the method is formulated as a combination of classifiers; or from the point of view of graphical models, using for example Hidden Markov Models or Conditional Random Fields. In this thesis, we are concerned with meta-learning strategies. Cohen et al. (17) showed that stacked sequential learning (SSL from now on) performed better than CRF and HMM on a subset of problems called “sequential partitioning problems”. These problems are characterized by long runs of identical labels. Moreover, SSL is computationally very efficient since it only needs to train two classifiers a constant number of times. Considering these benefits, we decided to explore in depth sequential learning using SSL and generalize the Cohen architecture to deal with a wider variety of problems

    Developing a pipeline for gait analysis with a side-view depth sensor

    Get PDF
    This thesis presents computational methods for conducting gait analysis with a sideview depth sensor. First, a method to segment human body parts in a depth image is presented. A standard supervised segmentation algorithm is run on a novel graph representation of the depth image. It is demonstrated that the new graph structure improves the accuracy of the segmentation. This contribution is intended to allow fast labelling of depth images for training a human joint predictor. Next, a method is presented to select accurate 3D positions of human joints from multiple proposals. These proposals are generated by a predictor from a side-view depth image. Finally, a gait analysis system is built on the joint selection process. The system calculates standard parameters used in clinical gait analysis. Walking trials have been measured concurrently by a pressure-sensitive walkway and a side-view depth sensor. The estimated gait parameters are validated against the ground truth parameters from the walkway. As future work, the initial segmentation process could be applied to multi-view depth images for training a view-invariant joint predictor. The proposed gait analysis system can then be applied to the predicted joints

    Augmentieren von Personen in Monokularen Videodaten

    Get PDF
    When aiming at realistic video augmentation, i.e. the embedding of virtual, 3-dimensional objects into a scene's original content, a series of challenging problems has to be solved. This is especially the case when working with solely monocular input material, as important additional 3D information is missing and has to be recovered during the process, if necessary. In this work, I will present a semi-automatic strategy to tackle this task by providing solutions to individual problems in the context of virtual clothing as an example for realistic video augmentation. Starting with two different approaches for monocular pose and motion estimation, I will show how to build a 3D human body model by estimating detailed shape information as well as basic surface material properties. This information allows to further extract a dynamic illumination model from the provided input material. The illumination model is particularly important for rendering a realistic virtual object and adds a lot of realism to the final video augmentation. The animated human model is able to interact with virtual 3D objects and is used in the context of virtual clothing to animate simulated garments. To achieve the desired realism, I present an additional image-based compositing approach that realistically embeds the simulated garment into the original scene content. Combining the presented approaches provide an integrated strategy for realistic augmentation of actors in monocular video sequences.Unter der Zielsetzung einer realistischen Videoaugmentierung durch das Einbetten virtueller, dreidimensionaler Objekte in eine bestehende Videoaufnahme, gibt eine Reihe interessanter und schwieriger Problemen zu lösen. Besonders im Hinblick auf die Verarbeitung monokularer Eingabedaten fehlen wichtige räumliche Informationen, welche aus den zweidimensionalen Eingabedaten rekonstruiert werden müssen. In dieser Arbeit präsentiere ich eine halbautomatische Verfahrensweise, welche es ermöglicht, die einzelnen Teilprobleme einer umfassenden Videoaugmentierung nacheinander in einer integrierten Strategie zu lösen. Dies demonstriere ich am Beispiel von virtueller Kleidung. Beginnend mit zwei unterschiedlichen Ansätzen zur Posen- und Bewegungsrekonstruktion wird ein realistisches 3D Körpermodell eines Menschen erzeugt. Dazu wird die detaillierte Körperform durch ein geeignetes Verfahren approximiert und eine Rekonstruktion der Oberflächenmaterialen vorgenommen. Diese Informationen werden unter anderem dazu verwendet, aus dem Eingabevideo eine dynamische Szenenbeleuchtung zu rekonstruieren. Die Beleuchtungsinformationen sind besonders wichtig für eine realistische Videoaugmentierung, da gerade eine korrekte Beleuchtung den Realitätsgrad des virtuell generierten Objektes erhöht. Das rekonstruierte und animierte Körpermodell ist durch seinen Detailgrad in der Lage, mit virtuellen Objekten zu interagieren. Dies kommt besonders im Anwendungsfall von virtueller Kleidung zum tragen. Um den gewünschten Realitätsgrad zu erreichen, führe ich ein zusätzliches, bild-basiertes Korrekturverfahren ein, welches hilft, die finale Bildkomposition zu optimieren. Die Kombination aller präsentierter Teilverfahren bildet eine vollumfängliche Strategie zur Augmentierung von monokularem Videomaterial, die zur realistischen Simulation und Einbettung von virtueller Kleidung eines Schauspielers im Originalvideo verwendet werden kann

    Analysis of human motion with vision systems: kinematic and dynamic parameters estimation

    Get PDF
    This work presents a multicamera motion capture system able to digitize, measure and analyse the human motion. Key feature of this system is an easy wearable garment printed with a color coded pattern. The pattern of coloured markers allows simultaneous reconstruction of shape and motion of the subject. With the information gathered we can also estimate both kinematic and dynamic motion parameters. In the framework of this research we developed algorithms to: design the color coded pattern, perform 3D shape reconstruction, estimate kinematic and dynamic motion parameters and calibrate the multi-camera system. We paid particular attention to estimate the uncertainty of the kinematics parameters, also comparing the results obtained with commercial systems. The work presents also an overview of some real-world application in which the developed system has been used as measurement tool

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Vascular Segmentation Algorithms for Generating 3D Atherosclerotic Measurements

    Get PDF
    Atherosclerosis manifests as plaques within large arteries of the body and remains as a leading cause of mortality and morbidity in the world. Major cardiovascular events may occur in patients without known preexisting symptoms, thus it is important to monitor progression and regression of the plaque burden in the arteries for evaluating patient\u27s response to therapy. In this dissertation, our main focus is quantification of plaque burden from the carotid and femoral arteries, which are major sites for plaque formation, and are straight forward to image noninvasively due to their superficial location. Recently, 3D measurements of plaque burden have shown to be more sensitive to the changes of plaque burden than one-/two-dimensional measurements. However, despite the advancements of 3D noninvasive imaging technology with rapid acquisition capabilities, and the high sensitivity of the 3D plaque measurements of plaque burden, they are still not widely used due to the inordinate amount of time and effort required to delineate artery walls plus plaque boundaries to obtain 3D measurements from the images. Therefore, the objective of this dissertation is developing novel semi-automated segmentation methods to alleviate measurement burden from the observer for segmentation of the outer wall and lumen boundaries from: (1) 3D carotid ultrasound (US) images, (2) 3D carotid black-blood magnetic resonance (MR) images, and (3) 3D femoral black-blood MR images. Segmentation of the carotid lumen and outer wall from 3DUS images is a challenging task due to low image contrast, for which no method has been previously reported. Initially, we developed a 2D slice-wise segmentation algorithm based on the level set method, which was then extended to 3D. The 3D algorithm required fewer user interactions than manual delineation and the 2D method. The algorithm reduced user time by ≈79% (1.72 vs. 8.3 min) compared to manual segmentation for generating 3D-based measurements with high accuracy (Dice similarity coefficient (DSC)\u3e90%). Secondly, we developed a novel 3D multi-region segmentation algorithm, which simultaneously delineates both the carotid lumen and outer wall surfaces from MR images by evolving two coupled surfaces using a convex max-flow-based technique. The algorithm required user interaction only on a single transverse slice of the 3D image for generating 3D surfaces of the lumen and outer wall. The algorithm was parallelized using graphics processing units (GPU) to increase computational speed, thus reducing user time by 93% (0.78 vs. 12 min) compared to manual segmentation. Moreover, the algorithm yielded high accuracy (DSC \u3e 90%) and high precision (intra-observer CV \u3c 5.6% and inter-observer CV \u3c 6.6%). Finally, we developed and validated an algorithm based on convex max-flow formulation to segment the femoral arteries that enforces a tubular shape prior and an inter-surface consistency of the outer wall and lumen to maintain a minimum separation distance between the two surfaces. The algorithm required the observer to choose only about 11 points on its medial axis of the artery to yield the 3D surfaces of the lumen and outer wall, which reduced the operator time by 97% (1.8 vs. 70-80 min) compared to manual segmentation. Furthermore, the proposed algorithm reported DSC greater than 85% and small intra-observer variability (CV ≈ 6.69%). In conclusion, the development of robust semi-automated algorithms for generating 3D measurements of plaque burden may accelerate translation of 3D measurements to clinical trials and subsequently to clinical care

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    A Gesture Recognition System for Detecting Behavioral Patterns of ADHD

    Get PDF
    We present an application of gesture recognition using an extension of dynamic time warping (DTW) to recognize behavioral patterns of attention deficit hyperactivity disorder (ADHD). We propose an extension of DTW using one-class classifiers in order to be able to encode the variability of a gesture category, and thus, perform an alignment between a gesture sample and a gesture class. We model the set of gesture samples of a certain gesture category using either Gaussian mixture models or an approximation of convex hulls. Thus, we add a theoretical contribution to classical warping path in DTW by including local modeling of intraclass gesture variability. This methodology is applied in a clinical context, detecting a group of ADHD behavioral patterns defined by experts in psychology/psychiatry, to provide support to clinicians in the diagnose procedure. The proposed methodology is tested on a novel multimodal dataset (RGB plus depth) of ADHD children recordings with behavioral patterns. We obtain satisfying results when compared to standard state-of-the-art approaches in the DTW context
    corecore