7 research outputs found

    Deep Representation-aligned Graph Multi-view Clustering for Limited Labeled Multi-modal Health Data

    Get PDF
    Today, many fields are characterised by having extensive quantities of data from a wide range of dissimilar sources and domains. One such field is medicine, in which data contain exhaustive combinations of spatial, temporal, linear, and relational data. Often lacking expert-assessed labels, much of this data would require analysis within the fields of unsupervised or semi-supervised learning. Thus, reasoned by the notion that higher view-counts provide more ways to recognise commonality across views, contrastive multi-view clustering may be utilised to train a model to suppress redundancy and otherwise medically irrelevant information. Yet, standard multi-view clustering approaches do not account for relational graph data. Recent developments aim to solve this by utilising various graph operations including graph-based attention. And within deep-learning graph-based multi-view clustering on a sole view-invariant affinity graph, representation alignment remains unexplored. We introduce Deep Representation-Aligned Graph Multi-View Clustering (DRAGMVC), a novel attention-based graph multi-view clustering model. Comparing maximal performance, our model surpassed the state-of-the-art in eleven out of twelve metrics on Cora, CiteSeer, and PubMed. The model considers view alignment on a sample-level by employing contrastive loss and relational data through a novel take on graph attention embeddings in which we use a Markov chain prior to increase the receptive field of each layer. For clustering, a graph-induced DDC module is used. GraphSAINT sampling is implemented to control our mini-batch space to capitalise on our Markov prior. Additionally, we present the MIMIC pleural effusion graph multi-modal dataset, consisting of two modalities registering 3520 chest X-ray images along with two static views registered within a one-day time frame: vital signs and lab tests. These making up the, in total, three views of the dataset. We note a significant improvement in terms of separability, view mixing, and clustering performance comparing DRAGMVC to preceding non-graph multi-view clustering models, suggesting a possible, largely unexplored use case of unsupervised graph multi-view clustering on graph-induced, multi-modal, and complex medical data

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Social Network Data Management

    Get PDF
    With the increasing usage of online social networks and the semantic web's graph structured RDF framework, and the rising adoption of networks in various fields from biology to social science, there is a rapidly growing need for indexing, querying, and analyzing massive graph structured data. Facebook has amassed over 500 million users creating huge volumes of highly connected data. Governments have made RDF datasets containing billions of triples available to the public. In the life sciences, researches have started to connect disparate data sets of research results into one giant network of valuable information. Clearly, networks are becoming increasingly popular and growing rapidly in size, requiring scalable solutions for network data management. This thesis focuses on the following aspects of network data management. We present a hierarchical index structure for external memory storage of network data that aims to maximize data locality. We propose efficient algorithms to answer subgraph matching queries against network databases and discuss effective pruning strategies to improve performance. We show how adaptive cost models can speed up subgraph matching query answering by assigning budgets to index retrieval operations and adjusting the query plan while executing. We develop a cloud oriented social network database, COSI, which handles massive network datasets too large for a single computer by partitioning the data across multiple machines and achieving high performance query answering through asynchronous parallelization and cluster-aware heuristics. Tracking multiple standing queries against a social network database is much faster with our novel multi-view maintenance algorithm, which exploits common substructures between queries. To capture uncertainty inherent in social network querying, we define probabilistic subgraph matching queries over deterministic graph data and propose algorithms to answer them efficiently. Finally, we introduce a general relational machine learning framework and rule-based language, Probabilistic Soft Logic, to learn from and probabilistically reason about social network data and describe applications to information integration and information fusion

    Geometric uncertainty models for correspondence problems in digital image processing

    Get PDF
    Many recent advances in technology rely heavily on the correct interpretation of an enormous amount of visual information. All available sources of visual data (e.g. cameras in surveillance networks, smartphones, game consoles) must be adequately processed to retrieve the most interesting user information. Therefore, computer vision and image processing techniques gain significant interest at the moment, and will do so in the near future. Most commonly applied image processing algorithms require a reliable solution for correspondence problems. The solution involves, first, the localization of corresponding points -visualizing the same 3D point in the observed scene- in the different images of distinct sources, and second, the computation of consistent geometric transformations relating correspondences on scene objects. This PhD presents a theoretical framework for solving correspondence problems with geometric features (such as points and straight lines) representing rigid objects in image sequences of complex scenes with static and dynamic cameras. The research focuses on localization uncertainty due to errors in feature detection and measurement, and its effect on each step in the solution of a correspondence problem. Whereas most other recent methods apply statistical-based models for spatial localization uncertainty, this work considers a novel geometric approach. Localization uncertainty is modeled as a convex polygonal region in the image space. This model can be efficiently propagated throughout the correspondence finding procedure. It allows for an easy extension toward transformation uncertainty models, and to infer confidence measures to verify the reliability of the outcome in the correspondence framework. Our procedure aims at finding reliable consistent transformations in sets of few and ill-localized features, possibly containing a large fraction of false candidate correspondences. The evaluation of the proposed procedure in practical correspondence problems shows that correct consistent correspondence sets are returned in over 95% of the experiments for small sets of 10-40 features contaminated with up to 400% of false positives and 40% of false negatives. The presented techniques prove to be beneficial in typical image processing applications, such as image registration and rigid object tracking

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    What else does your biometric data reveal? A survey on soft biometrics

    Get PDF
    International audienceRecent research has explored the possibility of extracting ancillary information from primary biometric traits, viz., face, fingerprints, hand geometry and iris. This ancillary information includes personal attributes such as gender, age, ethnicity, hair color, height, weight, etc. Such attributes are known as soft biometrics and have applications in surveillance and indexing biometric databases. These attributes can be used in a fusion framework to improve the matching accuracy of a primary biometric system (e.g., fusing face with gender information), or can be used to generate qualitative descriptions of an individual (e.g., "young Asian female with dark eyes and brown hair"). The latter is particularly useful in bridging the semantic gap between human and machine descriptions of biometric data. In this paper, we provide an overview of soft biometrics and discuss some of the techniques that have been proposed to extract them from image and video data. We also introduce a taxonomy for organizing and classifying soft biometric attributes, and enumerate the strengths and limitations of these attributes in the context of an operational biometric system. Finally, we discuss open research problems in this field. This survey is intended for researchers and practitioners in the field of biometrics

    Face tracking with active models for a driver monitoring application

    Get PDF
    La falta de atención durante la conducción es una de las principales causas de accidentes de tráfico. La \ud \ud monitorización del conductor para detectar inatención es un problema complejo, que incluye elementos fisiológicos y de \ud \ud comportamiento. Un sistema de Visión Computacional para detección de inatención se compone de varios etapas de procesado, y \ud \ud esta tesis se centra en el seguimiento de la cara del conductor. La tesis doctoral propone un nuevo conjunto de vídeos de \ud \ud conductores, grabados en un vehículo real y en dos simuladores realistas, que contienen la mayoría de los comportamientos \ud \ud presentes en la conducción, incluyendo gestos, giros de cabeza, interacción con el sistema de sonido y otras distracciones, \ud \ud y somnolencia. Esta base de datos, RS-DMV, se emplea para evaluar el rendimiento de los métodos que propone la tesis y \ud \ud otros del estado del arte. La tesis analiza el rendimiento de los Modelos Activos de Forma (ASM), y de los Modelos Locales \ud \ud Restringidos (CLM), por considerarlos a priori de interés. En concreto, se ha evaluado el método Stacked Trimmed ASM \ud \ud (STASM), que integra una serie de mejoras sobre el ASM original, mostrando una alta precisión en todas las pruebas cuando \ud \ud la cara es frontal a la cámara, si bien no funciona con la cara girada y su velocidad de ejecución es muy baja. CLM es \ud \ud capaz de ejecutarse con mayor rapidez, pero tiene una precisión mucho menor en todos los casos. El tercer método a evaluar \ud \ud es el Modelado y Seguimiento Simultáneo (SMAT), que caracteriza la forma y la textura de manera incremental, a partir de \ud \ud muestras encontradas previamente. La textura alrededor de cada punto de la forma que define la cara se modela mediante un \ud \ud conjunto de grupos (clusters) de muestras pasadas. El trabajo de tesis propone 3 métodos de clustering alternativos al \ud \ud original para la textura, y un modelo de forma entrenado off-line con una función de ajuste robusta. Los métodos \ud \ud alternativos propuestos obtienen una amplia mejora tanto en la precisión del seguimiento como en la robustez de éste frente \ud \ud a giros de cabeza, oclusiones, gestos y cambios de iluminación. Los métodos propuestos tienen, además, una baja carga \ud \ud computacional, y son capaces de ejecutarse a velocidades en torno a 100 imágenes por segundo en un computador de sobremesa
    corecore