563 research outputs found

    Detection of Control Structures in Spoken Utterances

    Get PDF

    Detection of Control Structures in Spoken Utterances

    Get PDF

    Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

    Full text link
    We develop a natural language interface for human robot interaction that implements reasoning about deep semantics in natural language. To realize the required deep analysis, we employ methods from cognitive linguistics, namely the modular and compositional framework of Embodied Construction Grammar (ECG) [Feldman, 2009]. Using ECG, robots are able to solve fine-grained reference resolution problems and other issues related to deep semantics and compositionality of natural language. This also includes verbal interaction with humans to clarify commands and queries that are too ambiguous to be executed safely. We implement our NLU framework as a ROS package and present proof-of-concept scenarios with different robots, as well as a survey on the state of the art

    Examining the cognitive costs of counterfactual language comprehension: Evidence from ERPs

    Get PDF
    Recent empirical research suggests that understanding a counterfactual event (e.g. ‘If Josie had revised, she would have passed her exams’) activates mental representations of both the factual and counterfactual versions of events. However, it remains unclear when readers switch between these models during comprehension, and whether representing multiple ‘worlds’ is cognitively effortful. This paper reports two ERP studies where participants read contexts that set up a factual or counterfactual scenario, followed by a second sentence describing a consequence of this event. Critically, this sentence included a noun that was either consistent or inconsistent with the preceding context, and either included a modal verb to indicate reference to the counterfactual-world or not (thus referring to the factual-world). Experiment 2 used adapted versions of the materials used in Experiment 1 to examine the degree to which representing multiple versions of a counterfactual situation makes heavy demands on cognitive resources by measuring individuals’ working memory capacity. Results showed that when reference to the counterfactual-world was maintained by the ongoing discourse, readers correctly interpreted events according to the counterfactual-world (i.e. showed larger N400 for inconsistent than consistent words). In contrast, when cues referred back to the factual-world, readers showed no difference between consistent and inconsistent critical words, suggesting that they simultaneously compared information against both possible worlds. These results support previous dual-representation accounts for counterfactuals, and provide new evidence that linguistic cues can guide the reader in selecting which world model to evaluate incoming information against. Crucially, we reveal evidence that maintaining and updating a hypothetical model over time relies upon the availability of cognitive resources

    AN ANALYSIS OF INTONATION PATTERNS IN ECUADORIAN CUENCANO SPANISH: A SP_ToBI DESCRIPTION

    Get PDF
    El Cantado Cuencano ‘Cuencano singing’ constitutes the hallmark of Cuenca citizens. This colloquially described intonational feature is what makes Cuencano Spanish one of the most prosodically interesting Andean dialects in the country of Ecuador. There is, however, a lack of scientific research conducted on this dialect’s intonation, which can be considered as under-documented up to this point. Therefore, the main objective of the present study was to begin to analyze and document Cuencano Spanish intonation patterns. In addition, this research also aimed to provide scientific evidence and draw plausible conclusions to support or refute the impressionistic observations about the Indigenous origins of Cuencano singing. A sample of 550 utterances produced by 5 male and 5 female participants was collected in order to conduct this research. The sample comprised 11 categories that included declarative statements, yes/no questions, exclamative statements, wh-questions, imperatives, lists, conditionals, tag-questions, interjections, negative statements, and vocatives. The tokens were analyzed using Praat and labeled by implementing the Spanish version of the Tones and Break Indices system (Sp_ToBI). It was found that the presence of the emphatic pitch accent labeled as L+^H* and the high frequency appearance of bitonal pitch accents, such as L+H* and H+L*, in almost every token in the data set suggest that Cuencanos speak with a variety of degrees of tonal emphasis. This translates into a mixture of a substantial number of rising and falling tones found in Cuencanos’ speech. These findings account for the appearance of the highly marked singing quality of Cuencano Spanish or Cantado Cuencano. They may also be linked to impressionistic descriptions, such as esdrujulizacion, and the influence that Indigenous languages and culture had on Cuencano Spanish

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    What’s the Matter? Knowledge Acquisition by Unsupervised Multi-Topic Labeling for Spoken Utterances

    Get PDF
    Systems such as Alexa, Cortana, and Siri app ear rather smart. However, they only react to predefined wordings and do not actually grasp the user\u27s intent. To overcome this limitation, a system must understand the topics the user is talking about. Therefore, we apply unsupervised multi-topic labeling to spoken utterances. Although topic labeling is a well-studied task on textual documents, its potential for spoken input is almost unexplored. Our approach for topic labeling is tailored to spoken utterances; it copes with short and ungrammatical input. The approach is two-tiered. First, we disambiguate word senses. We utilize Wikipedia as pre-labeled corpus to train a naĂŻve-bayes classifier. Second, we build topic graphs based on DBpedia relations. We use two strategies to determine central terms in the graphs, i.e. the shared topics. One fo cuses on the dominant senses in the utterance and the other covers as many distinct senses as possible. Our approach creates multiple distinct topics per utterance and ranks results. The evaluation shows that the approach is feasible; the word sense disambiguation achieves a recall of 0.799. Concerning topic labeling, in a user study subjects assessed that in 90.9% of the cases at least one proposed topic label among the first four is a good fit. With regard to precision, the subjects judged that 77.2% of the top ranked labels are a good fit or good but somewhat too broad (Fleiss\u27 kappa Îș = 0.27). We illustrate areas of application of topic labeling in the field of programming in spoken language. With topic labeling applied to the spoken input as well as ontologies that model the situational context we are able to select the most appropriate ontologies with an F1-score of 0.907

    Towards Programming in Natural Language: Learning New Functions from Spoken Utterances

    Get PDF
    Systems with conversational interfaces are rather popular nowadays. However, their full potential is not yet exploited. For the time being, users are restricted to calling predefined functions. Soon, users will expect to customize systems to their needs and create own functions using nothing but spoken instructions. Thus, future systems must understand how laypersons teach new functionality to intelligent systems. The understanding of natural language teaching sequences is a first step toward comprehensive end-user programming in natural language. We propose to analyze the semantics of spoken teaching sequences with a hierarchical classification approach. First, we classify whether an utterance constitutes an effort to teach a new function or not. Afterward, a second classifier locates the distinct semantic parts of teaching efforts: declaration of a new function, specification of intermediate steps, and superfluous information. For both tasks we implement a broad range of machine learning techniques: classical approaches, such as NaĂŻve Bayes, and neural network configurations of various types and architectures, such as bidirectional LSTMs. Additionally, we introduce two heuristic-based adaptations that are tailored to the task of understanding teaching sequences. As data basis we use 3168 descriptions gathered in a user study. For the first task convolutional neural networks obtain the best results (accuracy: 96.6%); bidirectional LSTMs excel in the second (accuracy: 98.8%). The adaptations improve the first-level classification considerably (plus 2.2% points)

    Examining the cognitive costs of counterfactual language comprehension: evidence from ERPs

    Get PDF
    Recent empirical research suggests that understanding a counterfactual event (e.g. If Josie had revised, she would have passed her exams) activates mental representations of both the factual and counterfactual versions of events. However, it remains unclear when readers switch between these models during comprehension, and whether representing multiple 'worlds' is cognitively effortful. This paper reports two ERP studies where participants read contexts that set up a factual or counterfactual scenario, followed by a second sentence describing a consequence of this event. Critically, this sentence included a noun that was either consistent or inconsistent with the preceding context, and either included a modal verb to indicate reference to the counterfactual-world or not (thus referring to the factual-world). Experiment 2 used adapted versions of the materials used in Experiment 1 to examine the degree to which representing multiple versions of a counterfactual situation makes heavy demands on cognitive resources by measuring individuals' working memory capacity. Results showed that when reference to the counterfactual-world was maintained by the ongoing discourse, readers correctly interpreted events according to the counterfactual-world (i.e. showed larger N400 for inconsistent than consistent words). In contrast, when cues referred back to the factual-world, readers showed no difference between consistent and inconsistent critical words, suggesting that they simultaneously compared information against both possible worlds. These results support previous dual-representation accounts for counterfactuals, and provide new evidence that linguistic cues can guide the reader in selecting which world model to evaluate incoming information against. Crucially, we reveal evidence that maintaining and updating a hypothetical model over time relies upon the availability of cognitive resources
    • 

    corecore