957 research outputs found

    Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

    Full text link
    We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-of-the-art methods, our approach does not require label embeddings as part of the input and therefore lends itself nicely to a wide range of model architectures. In addition, our architecture computes contextual distances between words and labels to avoid adding contextual windows, thus reducing memory footprint. We validate the approach on established spoken dialogue datasets and show that it can achieve state-of-the-art performance with much fewer trainable parameters.Comment: Accepted for publication at ASRU 201

    The Ouroboros Model

    Get PDF
    At the core of the Ouroboros Model lies a self-referential recursive process with alternating phases of data acquisition and evaluation. Memory entries are organized in schemata. Activation at a time of part of a schema biases the whole structure and, in particular, missing features, thus triggering expectations. An iterative recursive monitor process termed ‘consumption analysis’ is then checking how well such expectations fit with successive activations. A measure for the goodness of fit, “emotion”, provides feedback as (self-) monitoring signal. Contradictions between anticipations based on previous experience and actual current data are highlighted as well as minor gaps and deficits. The basic algorithm can be applied to goal directed movements as well as to abstract rational reasoning when weighing evidence for and against some remote theories. A sketch is provided how the Ouroboros Model can shed light on rather different characteristics of human behavior including learning and meta-learning. Partial implementations proved effective in dedicated safety systems

    Reason from Context with Self-supervised Learning

    Full text link
    Self-supervised learning (SSL) learns to capture discriminative visual features useful for knowledge transfers. To better accommodate the object-centric nature of current downstream tasks such as object recognition and detection, various methods have been proposed to suppress contextual biases or disentangle objects from contexts. Nevertheless, these methods may prove inadequate in situations where object identity needs to be reasoned from associated context, such as recognizing or inferring tiny or obscured objects. As an initial effort in the SSL literature, we investigate whether and how contextual associations can be enhanced for visual reasoning within SSL regimes, by (a) proposing a new Self-supervised method with external memories for Context Reasoning (SeCo), and (b) introducing two new downstream tasks, lift-the-flap and object priming, addressing the problems of "what" and "where" in context reasoning. In both tasks, SeCo outperformed all state-of-the-art (SOTA) SSL methods by a significant margin. Our network analysis revealed that the proposed external memory in SeCo learns to store prior contextual knowledge, facilitating target identity inference in the lift-the-flap task. Moreover, we conducted psychophysics experiments and introduced a Human benchmark in Object Priming dataset (HOP). Our results demonstrate that SeCo exhibits human-like behaviors

    A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System

    Full text link
    Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of understanding the user by capturing the necessary information in the form of slots and generating an appropriate response in accordance with the extracted information. Recently, dialogue systems integrated with complementary information such as images, audio, or video have gained immense popularity. In this work, we propose an end-to-end framework with the capability to extract necessary slot values from the utterance and generate a coherent response, thereby assisting the user to achieve their desired goals in a multimodal dialogue system having both textual and visual information. The task of extracting the necessary information is dependent not only on the text but also on the visual cues present in the dialogue. Similarly, for the generation, the previous dialog context comprising multimodal information is significant for providing coherent and informative responses. We employ a multimodal hierarchical encoder using pre-trained DialoGPT and also exploit the knowledge base (Kb) to provide a stronger context for both the tasks. Finally, we design a slot attention mechanism to focus on the necessary information in a given utterance. Lastly, a decoder generates the corresponding response for the given dialogue context and the extracted slot values. Experimental results on the Multimodal Dialogue Dataset (MMD) show that the proposed framework outperforms the baselines approaches in both the tasks. The code is available at https://github.com/avinashsai/slot-gpt.Comment: Published in the journal Multimedia Tools and Application

    The interpretation of noun noun compounds

    Get PDF
    This thesis looks at conceptual combination, in particular it investigates how noun noun compounds are interpreted. Several themes run throughout the work. Real compounds (e.g. coat hanger, crab apple) are compared to novel ones (e.g. banjo cactus, zip violin). Also, compounds are examined in each of the possible permutations of artefacts (A) (e.g. coat, banjo) and natural kinds (N) (e.g. crab, cactus), (AA, AN, NA and NN).Experiments 1 - 4 examine noncompositionality in noun noun compounds. Possible sources of noncompositionality are investigated using both feature listing and feature rating tasks. Although some differences were found, results were similar between different types of compound, evidence of noncompositionality being found in each. The results also confirm that most of the meaning of a noun noim compound is derived from the second constituent (noun2).Experiments 5 and 6 look at two different types of compoimd interpretation - slot filling and property mapping. In experiment 5, slot filling is found to be the preferred interpretation type overall, but property mapping is more common in compounds composed of two natural kinds (NN). Experiment 6 examines possible factors influencing the choice between slot filling and property mapping interpretations. It was found that constituent similarity plays an important role, and also that this interacts with whether or not the constituents have important properties which clash. Experiment 7 looks at compound identification. Results suggest that the first constituent (nounl) may be critical in such tasks. Experiment 8 compares the importance of nounl and noun2 in determining the type of interpretation given to a compound. Neither position is found to be more influential than the other, although relational information does seem to be associated with specific nouns in each position. Throughout the thesis findings are related to current theories of conceptual combination, such as prototype models, the concept specialisation model and theories of compound interpretation by analogy

    Filled-gap effects in sentence processing: different accounts compared

    Get PDF
    It is widely accepted that the human sentence parsing mechanism is subject to real-time constraints that demand some decisions to be made on-line. One of the areas of research in sentence processing has been to look at how long-distance dependencies where there is a relation between a fronted phrase ('filler') and its canonical position ('gap') are constructed on-line. These dependencies where there is an element that has been displaced are interesting to examine because they provide relevant cues for how sentence interpretation proceeds when the information that can be used to interpret the sentence is not immediately available. Research on the processing of long-distance dependencies has focused on different questions. One of the questions was directed to examine the specific point in time and the position in the sentence at which the parser posits gaps while processing long-distance dependencies. The other was to examine how the fronted phrase is interpreted and if the verb is necessary to interpret the dislocated phrase.This paper focuses on the different approaches that have been suggested to explain how long-distance dependencies are processed and how the fronted phrase gets interpreted. The objective is to look at whether the verb is completely necessary when trying to interpret information on-line. Bearing this in mind, I examine how the different processing theories account for the results of an experiment on clitic pronouns in Spanish. I compare parsing theories that presuppose the existence of gaps, parsing theories that presuppose a direct semantic association and HPSG theories against the Spanish data and conclude that there is pre-verbal information such as clitic pronouns that can be used to interpret displaced elements in sentence processing

    Filled-Gap Effects in Sentence Processing: different accounts compared

    Get PDF
    Article / Letter to edito
    corecore