410 research outputs found

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    The role of infant-directed speech in language development of infants with hearing loss

    Get PDF
    It is estimated that approximately two out of every 1000 infants worldwide are born with unilateral or bilateral hearing loss (HL). Congenital HL, which refers to HL present at birth, has major negative effects on infants’ speech and language acquisition. Although such negative effects can be mediated by early access to hearing devices and intervention, the majority of children with HL have delayed language development in comparison with their normal-hearing (NH) peers. The aim of this thesis was to provide a deeper empirical understanding of the acoustic features in infant-directed speech (IDS) to infants with HL compared to infants with NH of the same chronological and the same hearing age. The three specific objectives were set for this thesis. The first objective is to investigate the effects of HL and the degree of hearing experience on the acoustic features of IDS. The second objective is to assess adjustments in IDS features across development in IDS to infants with HL, as they acquire more hearing experience. The third objective is to evaluate the role of specific IDS components such as vowel hyperarticulation and exaggerated prosody in lexical processing in infants with NH from six to 18 months of age, at both neural and behavioural levels. This was achieved by conducting four experiments. The first experiment used a cross-sectional design that assessed the acoustic features in IDS to infants with HL with a specific focus on whether and how infants’ chronological age and hearing age may affect these features. Experiment 2 included a longitudinal investigation that focused on the acoustic features of IDS to infants with HL and infants with NH of the same hearing age. We sought to identify how infants’ changing linguistic needs may shape maternal IDS across development. Experiments 3 and 4 focused on lexical processing in six-, 10-, and 18-month-old infants, whereby we aimed to identify the role of specific IDS features in facilitating lexical processing in infants with NH at different stages of language acquisition. The results of this thesis demonstrated that mothers adjust their IDS to infants with HL in a similar manner as in IDS to infants with NH. However, some differences are evident in the production of the corner vowels /i/ and /u/. These differences exist even when controlling for the amount of hearing experience had by infants with HL. Additionally, findings demonstrated a relation between vowel production in IDS and infants’ receptive vocabulary indicating that the exaggeration in vowel production in maternal IDS may play a fostering role in infants’ language acquisition. This linguistic role was confirmed as vowel hyperarticulation was also found to facilitate lexical processing at the neural level in 10-month-old infants. However, with regard to older infants (18 months), our findings demonstrated that natural IDS with heightened pitch and vowel hyperarticulation represents the richest input that facilitates infants’ speech processing. In summary, the findings of this thesis suggest that congenital HL in infants affects maternal production of vowels in IDS resulting in less clear vowel categories. This may result from mothers adjusting their vowel production according to infants’ reduced vowel discrimination abilities, thus, adjusting their IDS to infants’ linguistic competence. Additionally, receptive vocabulary seems not to be affected by this, indicating the role of other cues for building a lexicon in infants with HL that warrant further investigation. Furthermore, the findings suggest that pitch and vowel hyperarticulation in IDS play significant roles in facilitating lexical processing in the first two years of life

    Understanding Neural Signals related to Speech Processing in Humans During Sleep

    Get PDF
    Many cognitive processes are surprisingly preserved during sleep, including the processing of basic language stimuli. However, whether the sleeping brain can process complex, natural speech is not yet known. The present study used regularized linear regression to understand which features of narrative speech, ranging from low-level acoustic information to higher-level linguistic information, are processed during sleep. Participants were exposed to an intact and scrambled narrative story while they were napping or lying awake. Temporal response functions (TRFs) mapped the relationship between participants’ EEG neural responses and the (1) auditory envelope, (2) word onsets and (3) semantic dissimilarity of words. For all three analyses, delayed but statistically similar TRF components were observed during sleep and wake. These findings suggest that the sleeping brain is capable of low-level auditory processing, speech segmentation and semantic processing of narrative speech. These findings highlight that natural language processing remains remarkably intact during sleep

    Application of Advanced MRI to Fetal Medicine and Surgery

    Get PDF
    Robust imaging is essential for comprehensive preoperative evaluation, prognostication, and surgical planning in the field of fetal medicine and surgery. This is a challenging task given the small fetal size and increased fetal and maternal motion which affect MRI spatial resolution. This thesis explores the clinical applicability of post-acquisition processing using MRI advances such as super-resolution reconstruction (SRR) to generate optimal 3D isotropic volumes of anatomical structures by mitigating unpredictable fetal and maternal motion artefact. It paves the way for automated robust and accurate rapid segmentation of the fetal brain. This enables a hierarchical analysis of volume, followed by a local surface-based shape analysis (joint spectral matching) using mathematical markers (curvedness, shape index) that infer gyrification. This allows for more precise, quantitative measurements, and calculation of longitudinal correspondences of cortical brain development. I explore the potential of these MRI advances in three clinical settings: fetal brain development in the context of fetal surgery for spina bifida, airway assessment in fetal tracheolaryngeal obstruction, and the placental-myometrial-bladder interface in placenta accreta spectrum (PAS). For the fetal brain, MRI advances demonstrated an understanding of the impact of intervention on cortical development which may improve fetal candidate selection, neurocognitive prognostication, and parental counselling. This is of critical importance given that spina bifida fetal surgery is now a clinical reality and is routinely being performed globally. For the fetal trachea, SRR can provide improved anatomical information to better select those pregnancies where an EXIT procedure is required to enable the fetal airway to be secured in a timely manner. This would improve maternal and fetal morbidity outcomes associated with haemorrhage and hypoxic brain injury. Similarly, in PAS, SRR may assist surgical planning by providing enhanced anatomical assessment and prediction for adverse peri-operative maternal outcome such as bladder injury, catastrophic obstetric haemorrhage and maternal death

    Vision-Inertial SLAM using Natural Features in Outdoor Environments

    Get PDF
    Simultaneous Localization and Mapping (SLAM) is a recursive probabilistic inferencing process used for robot navigation when Global Positioning Systems (GPS) are unavailable. SLAM operates by building a map of the robot environment, while concurrently localizing the robot within this map. The ultimate goal of SLAM is to operate anywhere using the environment's natural features as landmarks. Such a goal is difficult to achieve for several reasons. Firstly, different environments contain different types of natural features, each exhibiting large variance in its shape and appearance. Secondly, objects look differently from different viewpoints and it is therefore difficult to always recognize them. Thirdly, in most outdoor environments it is not possible to predict the motion of a vehicle using wheel encoders because of errors caused by slippage. Finally, the design of a SLAM system to operate in a large-scale outdoor setting is in itself a challenge. The above issues are addressed as follows. Firstly, a camera is used to recognize the environmental context (e. g. , indoor office, outdoor park) by analyzing the holistic spectral content of images of the robot's surroundings. A type of feature (e. g. , trees for a park) is then chosen for SLAM that is likely observable in the recognized setting. A novel tree detection system is introduced, which is based on perceptually organizing the content of images into quasi-vertical structures and marking those structures that intersect ground level as tree trunks. Secondly, a new tree recognition system is proposed, which is based on extracting Scale Invariant Feature Transform (SIFT) features on each tree trunk region and matching trees in feature space. Thirdly, dead-reckoning is performed via an Inertial Navigation System (INS), bounded by non-holonomic constraints. INS are insensitive to slippage and varying ground conditions. Finally, the developed Computer Vision and Inertial systems are integrated within the framework of an Extended Kalman Filter into a working Vision-INS SLAM system, named VisSLAM. VisSLAM is tested on data collected during a real test run in an outdoor unstructured environment. Three test scenarios are proposed, ranging from semi-automatic detection, recognition, and initialization to a fully automated SLAM system. The first two scenarios are used to verify the presented inertial and Computer Vision algorithms in the context of localization, where results indicate accurate vehicle pose estimation for the majority of its journey. The final scenario evaluates the application of the proposed systems for SLAM, where results indicate successful operation for a long portion of the vehicle journey. Although the scope of this thesis is to operate in an outdoor park setting using tree trunks as landmarks, the developed techniques lend themselves to other environments using different natural objects as landmarks

    Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts

    Get PDF
    Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people

    Advanced machine learning methods for oncological image analysis

    Get PDF
    Cancer is a major public health problem, accounting for an estimated 10 million deaths worldwide in 2020 alone. Rapid advances in the field of image acquisition and hardware development over the past three decades have resulted in the development of modern medical imaging modalities that can capture high-resolution anatomical, physiological, functional, and metabolic quantitative information from cancerous organs. Therefore, the applications of medical imaging have become increasingly crucial in the clinical routines of oncology, providing screening, diagnosis, treatment monitoring, and non/minimally- invasive evaluation of disease prognosis. The essential need for medical images, however, has resulted in the acquisition of a tremendous number of imaging scans. Considering the growing role of medical imaging data on one side and the challenges of manually examining such an abundance of data on the other side, the development of computerized tools to automatically or semi-automatically examine the image data has attracted considerable interest. Hence, a variety of machine learning tools have been developed for oncological image analysis, aiming to assist clinicians with repetitive tasks in their workflow. This thesis aims to contribute to the field of oncological image analysis by proposing new ways of quantifying tumor characteristics from medical image data. Specifically, this thesis consists of six studies, the first two of which focus on introducing novel methods for tumor segmentation. The last four studies aim to develop quantitative imaging biomarkers for cancer diagnosis and prognosis. The main objective of Study I is to develop a deep learning pipeline capable of capturing the appearance of lung pathologies, including lung tumors, and integrating this pipeline into the segmentation networks to leverage the segmentation accuracy. The proposed pipeline was tested on several comprehensive datasets, and the numerical quantifications show the superiority of the proposed prior-aware DL framework compared to the state of the art. Study II aims to address a crucial challenge faced by supervised segmentation models: dependency on the large-scale labeled dataset. In this study, an unsupervised segmentation approach is proposed based on the concept of image inpainting to segment lung and head- neck tumors in images from single and multiple modalities. The proposed autoinpainting pipeline shows great potential in synthesizing high-quality tumor-free images and outperforms a family of well-established unsupervised models in terms of segmentation accuracy. Studies III and IV aim to automatically discriminate the benign from the malignant pulmonary nodules by analyzing the low-dose computed tomography (LDCT) scans. In Study III, a dual-pathway deep classification framework is proposed to simultaneously take into account the local intra-nodule heterogeneities and the global contextual information. Study IV seeks to compare the discriminative power of a series of carefully selected conventional radiomics methods, end-to-end Deep Learning (DL) models, and deep features-based radiomics analysis on the same dataset. The numerical analyses show the potential of fusing the learned deep features into radiomic features for boosting the classification power. Study V focuses on the early assessment of lung tumor response to the applied treatments by proposing a novel feature set that can be interpreted physiologically. This feature set was employed to quantify the changes in the tumor characteristics from longitudinal PET-CT scans in order to predict the overall survival status of the patients two years after the last session of treatments. The discriminative power of the introduced imaging biomarkers was compared against the conventional radiomics, and the quantitative evaluations verified the superiority of the proposed feature set. Whereas Study V focuses on a binary survival prediction task, Study VI addresses the prediction of survival rate in patients diagnosed with lung and head-neck cancer by investigating the potential of spherical convolutional neural networks and comparing their performance against other types of features, including radiomics. While comparable results were achieved in intra- dataset analyses, the proposed spherical-based features show more predictive power in inter-dataset analyses. In summary, the six studies incorporate different imaging modalities and a wide range of image processing and machine-learning techniques in the methods developed for the quantitative assessment of tumor characteristics and contribute to the essential procedures of cancer diagnosis and prognosis
    • …
    corecore