11 research outputs found

    Joint Perceptual Learning and Natural Language Acquisition for Autonomous Robots

    Get PDF
    Understanding how children learn the components of their mother tongue and the meanings of each word has long fascinated linguists and cognitive scientists. Equally, robots face a similar challenge in understanding language and perception to allow for a natural and effortless human-robot interaction. Acquiring such knowledge is a challenging task, unless this knowledge is preprogrammed, which is no easy task either, nor does it solve the problem of language difference between individuals or learning the meaning of new words. In this thesis, the problem of bootstrapping knowledge in language and vision for autonomous robots is addressed through novel techniques in grammar induction and word grounding to the perceptual world. The learning is achieved in a cognitively plausible loosely-supervised manner from raw linguistic and visual data. The visual data is collected using different robotic platforms deployed in real-world and simulated environments and equipped with different sensing modalities, while the linguistic data is collected using online crowdsourcing tools and volunteers. The presented framework does not rely on any particular robot or any specific sensors; rather it is flexible to what the modalities of the robot can support. The learning framework is divided into three processes. First, the perceptual raw data is clustered into a number of Gaussian components to learn the ‘visual concepts’. Second, frequent co-occurrence of words and visual concepts are used to learn the language grounding, and finally, the learned language grounding and visual concepts are used to induce probabilistic grammar rules to model the language structure. In this thesis, the visual concepts refer to: (i) people’s faces and the appearance of their garments; (ii) objects and their perceptual properties; (iii) pairwise spatial relations; (iv) the robot actions; and (v) human activities. The visual concepts are learned by first processing the raw visual data to find people and objects in the scene using state-of-the-art techniques in human pose estimation, object segmentation and tracking, and activity analysis. Once found, the concepts are learned incrementally using a combination of techniques: Incremental Gaussian Mixture Models and a Bayesian Information Criterion to learn simple visual concepts such as object colours and shapes; spatio-temporal graphs and topic models to learn more complex visual concepts, such as human activities and robot actions. Language grounding is enabled by seeking frequent co-occurrence between words and learned visual concepts. Finding the correct language grounding is formulated as an integer programming problem to find the best many-to-many matches between words and concepts. Grammar induction refers to the process of learning a formal grammar (usually as a collection of re-write rules or productions) from a set of observations. In this thesis, Probabilistic Context Free Grammar rules are generated to model the language by mapping natural language sentences to learned visual concepts, as opposed to traditional supervised grammar induction techniques where the learning is only made possible by using manually annotated training examples on large datasets. The learning framework attains its cognitive plausibility from a number of sources. First, the learning is achieved by providing the robot with pairs of raw linguistic and visual inputs in a “show-and-tell” procedure akin to how human children learn about their environment. Second, no prior knowledge is assumed about the meaning of words or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). Third, the knowledge in both language and vision is obtained in an incremental manner where the gained knowledge can evolve to adapt to new observations without the need to revisit previously seen ones (previous observations). Fourth, the robot learns about the visual world first, then it learns about how it maps to language, which aligns with the findings of cognitive studies on language acquisition in human infants that suggest children come to develop considerable cognitive understanding about their environment in the pre-linguistic period of their lives. It should be noted that this work does not claim to be modelling how humans learn about objects in their environments, but rather it is inspired by it. For validation, four different datasets are used which contain temporally aligned video clips of people or robots performing activities, and sentences describing these video clips. The video clips are collected using four robotic platforms, three robot arms in simple block-world scenarios and a mobile robot deployed in a challenging real-world office environment observing different people performing complex activities. The linguistic descriptions for these datasets are obtained using Amazon Mechanical Turk and volunteers. The analysis performed on these datasets suggest that the learning framework is suitable to learn from complex real-world scenarios. The experimental results show that the learning framework enables (i) acquiring correct visual concepts from visual data; (ii) learning the word grounding for each of the extracted visual concepts; (iii) inducing correct grammar rules to model the language structure; (iv) using the gained knowledge to understand previously unseen linguistic commands; and (v) using the gained knowledge to generate well-formed natural language descriptions of novel scenes

    Fouille de données de santé

    Get PDF
    Dans le domaine de la santé, les techniques d’analyse de données sont de plus en plus populaires et se révèlent même indispensables pour gérer les gros volumes de données produits pour un patient et par le patient. Deux thématiques seront abordées dans cette présentation d'HDR.La première porte sur la définition, la formalisation, l’implémentation et la validation de méthodes d’analyse permettant de décrire le contenu de bases de données médicales. Je me suis particulièrement intéressée aux données séquentielles. J’ai fait évoluer la classique notion de motif séquentiel pour y intégrer des composantes contextuelles, spatiales et sur l’ordre partiel des éléments composant les motifs. Ces nouvelles informations enrichissent la sémantique initiale de ces motifs.La seconde thématique se focalise sur l’analyse des productions et des interactions des patients au travers des médias sociaux. J’ai principalement travaillé sur des méthodes permettant d’analyser les productions narratives des patients selon leurs temporalités, leurs thématiques, les sentiments associés ou encore le rôle et la réputation du locuteur s’étant exprimé dans les messages

    Human Factors in Automated and Robotic Space Systems: Proceedings of a symposium. Part 1

    Get PDF
    Human factors research likely to produce results applicable to the development of a NASA space station is discussed. The particular sessions covered in Part 1 include: (1) system productivity -- people and machines; (2) expert systems and their use; (3) language and displays for human-computer communication; and (4) computer aided monitoring and decision making. Papers from each subject area are reproduced and the discussions from each area are summarized

    Genetic mapping of metabolic biomarkers of cardiometabolic diseases

    Get PDF
    Cardiometabolic disorders (CMDs) are a major public health problem worldwide. The main goal of this thesis is to characterize the genetic architecture of CMD-related metabolites in a Lebanese cohort. In order to maximise the extraction of meaningful biological information from this dataset, an important part of this thesis focuses on the evaluation and subsequent improvement of the standard methods currently used for molecular epidemiology studies. First, I describe MetaboSignal, a novel network-based approach to explore the genetic regulation of the metabolome. Second, I comprehensively compare the recovery of metabolic information in the different 1H NMR strategies routinely used for metabolic profiling of plasma (standard 1D, spin-echo and JRES). Third, I describe a new method for dimensionality reduction of 1H NMR datasets prior to statistical modelling. Finally, I use all this methodological knowledge to search for molecular biomarkers of CMDs in a Lebanese population. Metabolome-wide association analyses identified a number of metabolites associated with CMDs, as well as several associations involving N-glycan units from acute-phase glycoproteins. Genetic mapping of these metabolites validated previously reported gene-metabolite associations, and revealed two novel loci associated with CMD-related metabolites. Collectively, this work contributes to the ongoing efforts to characterize the molecular mechanisms underlying complex human diseases.Open Acces

    Microparticle image processing and field profile optimisation for automated Lab-On-Chip magnetophoretic analytical systems

    Get PDF
    The work described in this thesis, concerns developments to analytical microfluidic Lab-On-Chip platform originally developed by Prof Pamme's research group at the University of Hull. This work aims to move away from traditional laboratory analysis system towards a more effective system design which is fully automated and therefore potentially deployable in applications such as point of care medical diagnosis. The microfluidic chip platform comprises an external permanent magnet and chip with multiple parallel reagent streams through which magnetic micro-particles pass in sequence. These streams may include particles, analyte, fluorescent labels and wash solutions; together they facilitate an on-chip multi-step analytical procedure. Analyte concentration is measured via florescent intensity of the exiting micro-particles. This has previously been experimentally proven for more than one analytical procedure. The work described here has addressed a couple of issues which needed improvement, specifically optimizing the magnetic field and automating the measurement process. These topics are related by the fact that an optimal field will reduce anomalies such as aggregated particles which may degrade automated measurements.For this system, the optimal magnetic field is homogeneous gradient of sufficient strength to pull the particles across the width of the device during fluid transit of its length. To optimise the magnetic field, COMSOL (a Multiphysics simulation program) was used to evaluate a number of multiple magnet configurations and demonstrate an improved field profile. The simulation approach was validated against experimental data for the original single-magnet design.To analyse the results automatically, a software tool has been developed using C++ which takes image files generated during an experiment and outputs a calibration curve or specific measurement result. The process involves detection of the particles (using image segmentation) and object tracking. The intensity measurement follows the same procedure as the original manual approach, facilitating comparison, but also includes analysis of particle motion behaviour to allow automatic rejection of data from anomalous particles (e.g. stuck particles). For image segmentation a novel texture based technique called Temporal- Adaptive Median Binary Pattern (T-AMBP) combining with Three Frame Difference method to model the background for representing the foreground was proposed. This proposed approached is based on previously developed Adaptive Median Binary Pattern (AMBP) and Gaussian Mixture Model (GMM) approach for image segmentation. The proposed method successfully detects micro-particles even when they have very low fluorescent intensity, while most of the previous approaches failed and is more robust to noise and artefacts. For tracking the micro-particles, we proposed a novel algorithm called "Hybrid Meanshift", which combines Meanshift, Histogram of oriented gradients (HOG) matching and optical flow techniques. Kalman filter was also combined with it to make the tracking robust.The processing of an experimental data set for generating a calibration curve, getting effectively the same results in less than 5 minutes was demonstrated, without needing experimental experience, compared with at least 2 hours work by an experienced experimenter using the manual approach

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Combining automated processing and customized analysis for large-scale sequencing data

    Get PDF
    Extensive application of high-throughput methods in life sciences has brought substantial new challenges for data analysis. Often many different steps have to be applied to a large number of samples. Here, workflow management systems support scientists through the automated execution of corresponding large analysis workflows. The first part of this cumulative dissertation concentrates on the development of Watchdog, a novel workflow management system for the automated analysis of large-scale experimental data. Watchdog`s main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. A graphical user interface enables workflow construction using a pre-defined toolset without programming experience and a community sharing platform allows scientists to share toolsets and workflows efficiently. Furthermore, we implemented methods for resuming execution of interrupted or partially modified workflows and for automated deployment of software using package managers and container virtualization. Using Watchdog, we implemented default analysis workflows for typical types of large-scale biological experiments, such as RNA-seq and ChIP-seq. Although they can be easily applied to new datasets of the same type, at some point such standard workflows reach their limit and customized methods are required to resolve specific questions. Hence, the second part of this dissertation focuses on combining standard analysis workflows with the development of application-specific novel bioinformatics approaches to address questions of interest to our biological collaboration partners. The first study concentrates on identifying the binding motif of the ZNF768 transcription factor, which consists of two anchor regions connected by a variable linker region. As standard motif finding methods detected only the anchors of the motifs separately, a custom method was developed for determining the spaced motif with the linker region. The second study focused on the effect of CDK12 inhibition on transcription. Results obtained from standard RNA-seq analysis indicated substantial transcript shortening upon CDK12 inhibition. We thus developed a new measure to quantify the degree of transcript shortening. In addition, a customized meta-gene analysis framework was developed to model RNA polymerase II progression using ChIP-seq data. This revealed that CDK12 inhibition causes an RNA polymerase II processivity defect resulting in the detected transcript shortening. In summary, the methods developed in this thesis represent both general contributions to large-scale sequencing data analysis and served to resolve specific questions regarding transcription factor binding and regulation of elongating RNA Polymerase II

    The Omics basis of human health: investigating plasma proteins and their genetic effects on complex traits

    Get PDF
    Over the past decade, the advancements in technology and the growing amount of identified genetic variants have led to a high number of important discoveries in the field of precision medicine concerning human biology and pathophysiology. However, it became evident that genomics alone could not properly explain the onset and regulation of the specific molecular mechanisms of certain phenotypes. Studying omics helped complement this gap in genetic research, providing detailed information on the quantification of molecules that are involved in structural and functional processes in the organism. Specifically, protein production, levels, and regulation are dynamic and change during the course of one’s lifetime. This information has proven fundamental to understanding how certain proteins affect complex phenotypes such as neurological and psychiatric disorders. In this thesis, I describe the three groups of analyses I conducted over the course of my doctoral programme on different sets of blood plasma proteins and over a broad range of neurological, psychiatric, cardiovascular, and electrophysiology phenotypes. The underlying mechanisms that trigger the onset of psychiatric and neurological conditions are often not limited to the nervous system, but rather stem from multi-system molecular triggers. The first part of the work I carried out aims at investigating the frequent co-occurrence and comorbidity of neurological and cardiovascular phenotypes by conducting a genome-wide association (GWA) meta-analysis of 183 neurology-related blood proteins on data from over 12000 individuals. The second part concerns the bivariate and multivariate analyses conducted on 276 cardiology and inflammatory proteins, while the third illustrates the contribution to consortia focussed on heart rate and electrophysiology. Results from the second and third parts of the work provided information that played an important role in understanding a part of the genetic mechanisms of the complex traits of interest. Overall, the results presented in this thesis strongly support the notion that proteomics is an important tool to be used to study complex traits and drug discovery and development should focus on targeting protein synthesis and regulation. Furthermore, the results also support the notion that complex diseases involve more than one biological system, and in order to gain a better understanding of human pathology, it is fundamental to study the causes and effects across the entire organism

    Winona Daily News

    Get PDF
    https://openriver.winona.edu/winonadailynews/1625/thumbnail.jp

    Frameshift mutations at the C-terminus of HIST1H1E result in a specific DNA hypomethylation signature

    Get PDF
    BACKGROUND: We previously associated HIST1H1E mutations causing Rahman syndrome with a specific genome-wide methylation pattern. RESULTS: Methylome analysis from peripheral blood samples of six affected subjects led us to identify a specific hypomethylated profile. This "episignature" was enriched for genes involved in neuronal system development and function. A computational classifier yielded full sensitivity and specificity in detecting subjects with Rahman syndrome. Applying this model to a cohort of undiagnosed probands allowed us to reach diagnosis in one subject. CONCLUSIONS: We demonstrate an epigenetic signature in subjects with Rahman syndrome that can be used to reach molecular diagnosis
    corecore