674 research outputs found

    Joint Perceptual Learning and Natural Language Acquisition for Autonomous Robots

    Get PDF
    Understanding how children learn the components of their mother tongue and the meanings of each word has long fascinated linguists and cognitive scientists. Equally, robots face a similar challenge in understanding language and perception to allow for a natural and effortless human-robot interaction. Acquiring such knowledge is a challenging task, unless this knowledge is preprogrammed, which is no easy task either, nor does it solve the problem of language difference between individuals or learning the meaning of new words. In this thesis, the problem of bootstrapping knowledge in language and vision for autonomous robots is addressed through novel techniques in grammar induction and word grounding to the perceptual world. The learning is achieved in a cognitively plausible loosely-supervised manner from raw linguistic and visual data. The visual data is collected using different robotic platforms deployed in real-world and simulated environments and equipped with different sensing modalities, while the linguistic data is collected using online crowdsourcing tools and volunteers. The presented framework does not rely on any particular robot or any specific sensors; rather it is flexible to what the modalities of the robot can support. The learning framework is divided into three processes. First, the perceptual raw data is clustered into a number of Gaussian components to learn the ‘visual concepts’. Second, frequent co-occurrence of words and visual concepts are used to learn the language grounding, and finally, the learned language grounding and visual concepts are used to induce probabilistic grammar rules to model the language structure. In this thesis, the visual concepts refer to: (i) people’s faces and the appearance of their garments; (ii) objects and their perceptual properties; (iii) pairwise spatial relations; (iv) the robot actions; and (v) human activities. The visual concepts are learned by first processing the raw visual data to find people and objects in the scene using state-of-the-art techniques in human pose estimation, object segmentation and tracking, and activity analysis. Once found, the concepts are learned incrementally using a combination of techniques: Incremental Gaussian Mixture Models and a Bayesian Information Criterion to learn simple visual concepts such as object colours and shapes; spatio-temporal graphs and topic models to learn more complex visual concepts, such as human activities and robot actions. Language grounding is enabled by seeking frequent co-occurrence between words and learned visual concepts. Finding the correct language grounding is formulated as an integer programming problem to find the best many-to-many matches between words and concepts. Grammar induction refers to the process of learning a formal grammar (usually as a collection of re-write rules or productions) from a set of observations. In this thesis, Probabilistic Context Free Grammar rules are generated to model the language by mapping natural language sentences to learned visual concepts, as opposed to traditional supervised grammar induction techniques where the learning is only made possible by using manually annotated training examples on large datasets. The learning framework attains its cognitive plausibility from a number of sources. First, the learning is achieved by providing the robot with pairs of raw linguistic and visual inputs in a “show-and-tell” procedure akin to how human children learn about their environment. Second, no prior knowledge is assumed about the meaning of words or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). Third, the knowledge in both language and vision is obtained in an incremental manner where the gained knowledge can evolve to adapt to new observations without the need to revisit previously seen ones (previous observations). Fourth, the robot learns about the visual world first, then it learns about how it maps to language, which aligns with the findings of cognitive studies on language acquisition in human infants that suggest children come to develop considerable cognitive understanding about their environment in the pre-linguistic period of their lives. It should be noted that this work does not claim to be modelling how humans learn about objects in their environments, but rather it is inspired by it. For validation, four different datasets are used which contain temporally aligned video clips of people or robots performing activities, and sentences describing these video clips. The video clips are collected using four robotic platforms, three robot arms in simple block-world scenarios and a mobile robot deployed in a challenging real-world office environment observing different people performing complex activities. The linguistic descriptions for these datasets are obtained using Amazon Mechanical Turk and volunteers. The analysis performed on these datasets suggest that the learning framework is suitable to learn from complex real-world scenarios. The experimental results show that the learning framework enables (i) acquiring correct visual concepts from visual data; (ii) learning the word grounding for each of the extracted visual concepts; (iii) inducing correct grammar rules to model the language structure; (iv) using the gained knowledge to understand previously unseen linguistic commands; and (v) using the gained knowledge to generate well-formed natural language descriptions of novel scenes

    Inference of Adaptive methods for Multi-Stage skew-t Simulated Data

    Get PDF
    Multilevel models can be used to account for clustering in data from multi-stage surveys. In some cases, the intra-cluster correlation may be close to zero, so that it may seem reasonable to ignore clustering and fit a single level model. This article proposes several adaptive strategies for allowing for clustering in regression analysis of multi-stage survey data. The approach is based on testing whether the cluster-level variance component is zero. If this hypothesis is retained, then variance estimates are calculated ignoring clustering; otherwise, clustering is reflected in variance estimation. A simple simulation study is used to evaluate the various procedures

    Biomechanics of Pharyngeal Deglutitive Function Following Total Laryngectomy

    Get PDF
    Copyright © 2016 American Academy of Otolaryngology—Head and Neck Surgery Foundation. Reprinted by permission of SAGE PublicationsObjective: Post-laryngectomy surgery, pharyngeal weakness and pharyngoesophageal junction (PEJ) restriction are the underlying candidate mechanisms of dysphagia. We aimed to determine, in laryngectomees whether: 1) hypopharyngeal propulsion is reduced and/or PEJ resistance is increased; 2) endoscopic dilatation improves dysphagia; and 3) if so, whether symptomatic improvement correlate with reduction in resistance to flow across the PEJ. Methods: Swallow biomechanics were assessed in 30 total laryngectomees. Average peak contractile pressure (hPP) and hypopharyngeal intrabolus pressure (hIBP) were measured from combined high resolution manometry and video-fluoroscopic recordings of barium swallows (2, 5&10ml). Patients were stratified into severe dysphagia (Sydney Swallow Questionnaire (SSQ)>500) and mild/nil dysphagia (SSQ≤500). In 5 patients, all measurements were repeated after endoscopic dilatation. Results: Dysphagia was reported by 87%, and 57% had severe and 43% had minor/nil dysphagia. Laryngectomees had lower hPP than controls (110±14mmHg vs 170±15mmHg; p<0.05), while hIBP was higher (29±5mmHg vs 6±5mmHg; p<0.05). There were no differences in hPP between patient groups. However, hIBP was higher in severe than in mild/nil dysphagia (41±10mmHg vs 13±3mmHg; p<0.05). Pre-dilation hIBP (R2=0.97) and its decrement following dilatation (R2=0.98) were good predictors of symptomatic improvement. Conclusion: Increased PEJ resistance is the predominant determinant of dysphagia as it correlates better with dysphagia severity than peak pharyngal contractile pressure. While both baseline PEJ resistance and its decrement following dilatation are strong predictors of outcome following dilatation, the peak pharyngeal pressure is not. PEJ resistance is vital to detect as it is the only potentially reversible component of dysphagia in this context

    What went wrong? The flawed concept of cerebrospinal venous insufficiency

    Get PDF
    In 2006, Zamboni reintroduced the concept that chronic impaired venous outflow of the central nervous system is associated with multiple sclerosis (MS), coining the term of chronic cerebrospinal venous insufficiency ('CCSVI'). The diagnosis of 'CCSVI' is based on sonographic criteria, which he found exclusively fulfilled in MS. The concept proposes that chronic venous outflow failure is associated with venous reflux and congestion and leads to iron deposition, thereby inducing neuroinflammation and degeneration. The revival of this concept has generated major interest in media and patient groups, mainly driven by the hope that endovascular treatment of 'CCSVI' could alleviate MS. Many investigators tried to replicate Zamboni's results with duplex sonography, magnetic resonance imaging, and catheter angiography. The data obtained here do generally not support the 'CCSVI' concept. Moreover, there are no methodologically adequate studies to prove or disprove beneficial effects of endovascular treatment in MS. This review not only gives a comprehensive overview of the methodological flaws and pathophysiologic implausibility of the 'CCSVI' concept, but also summarizes the multimodality diagnostic validation studies and open-label trials of endovascular treatment. In our view, there is currently no basis to diagnose or treat 'CCSVI' in the care of MS patients, outside of the setting of scientific research

    New double stage ranked set sampling for estimating the population mean

    Get PDF
    In environmental and many other areas, the main focus of survey is to measureelements using an efficient and cost-effective sampling technique. One way to reach that isby using Ranked set sampling (RSS). RSS is an alternative sampling technique that canimprove the efficiency of estimators when measuring the variable of interest is eithercostly or time-consuming but ranking its elements in a small set is easy. The purpose of thisarticle is to introduce a new modification of RSS to estimate the mean of the targetpopulation. This proposed technique is a double-stage approach that combines median RSS(MRSS) and MiniMax RSS (MMRSS). The performance of the empirical mean and varianceestimators based on the proposed technique are compared with their counterparts inMMRSS, RSS and simple random sampling (SRS) via Monte Carlo simulation. Simulationresults revealed that this new modification is always more efficient than their counterpartsusing MMRSS and SRS, while it is more efficient than RSS is many cases especially when thedistribution is asymmetric

    New double stage ranked set sampling for estimating the population mean

    Get PDF
    In environmental and many other areas, the main focus of survey is to measureelements using an efficient and cost-effective sampling technique. One way to reach that isby using Ranked set sampling (RSS). RSS is an alternative sampling technique that canimprove the efficiency of estimators when measuring the variable of interest is eithercostly or time-consuming but ranking its elements in a small set is easy. The purpose of thisarticle is to introduce a new modification of RSS to estimate the mean of the targetpopulation. This proposed technique is a double-stage approach that combines median RSS(MRSS) and MiniMax RSS (MMRSS). The performance of the empirical mean and varianceestimators based on the proposed technique are compared with their counterparts inMMRSS, RSS and simple random sampling (SRS) via Monte Carlo simulation. Simulationresults revealed that this new modification is always more efficient than their counterpartsusing MMRSS and SRS, while it is more efficient than RSS is many cases especially when thedistribution is asymmetric
    corecore