5,322 research outputs found

    Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

    Get PDF
    Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.Comment: EMNLP 201

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    Two Uummarmiutun modals – including a brief comparison with Utkuhikšalingmiutut cognates

    Get PDF
    The paper is concerned with the meaning of two modal postbases in Uummarmiutun, hungnaq ‘probably’ and ȓukȓau ‘should’. Uummarmiutun is an Inuktut dialect spoken in the Western Arctic. The analyses are founded on knowledge shared by native speakers of Uummarmiutun. Their statements and elaborations are quoted throughout the paper to show how they have explained the meaning nuances of modal expressions in their language. The paper also includes a comparison with cognates in Utkuhikšalingmiutut, which belongs to the eastern part of the Western Canadian dialect group (Dorais, 2010). Using categories from Cognitive Functional Linguistics (Boye, 2005, 2012), the paper shows which meanings are covered by hungnaq and ȓukȓau. This allows us to discover subtle differences between the meanings of Uummarmiutun hungnaq and ȓukȓau and their Utkuhikšalingmiutut cognates respectively

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Crossmodal content binding in information-processing architectures

    Get PDF
    Operating in a physical context, an intelligent robot faces two fundamental problems. First, it needs to combine information from its different sensors to form a representation of the environment that is more complete than any of its sensors on its own could provide. Second, it needs to combine high-level representations (such as those for planning and dialogue) with its sensory information, to ensure that the interpretations of these symbolic representations are grounded in the situated context. Previous approaches to this problem have used techniques such as (low-level) information fusion, ontological reasoning, and (high-level) concept learning. This paper presents a framework in which these, and other approaches, can be combined to form a shared representation of the current state of the robot in relation to its environment and other agents. Preliminary results from an implemented system are presented to illustrate how the framework supports behaviours commonly required of an intelligent robot

    Adaptive Sentence Boundary Disambiguation

    Full text link
    Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5\% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.Comment: This is a Latex version of the previously submitted ps file (formatted as a uuencoded gz-compressed .tar file created by csh script). The software from the work described in this paper is available by contacting [email protected]

    Emergence phenomena in German W-immer/auch-subordinators

    Get PDF
    The present study is concerned with the distributional patterns of the irrelevance particles immer ‘ever’ and auch ‘also’ in German universal concessive conditionals and free relatives (e.g. was immer er auch sagt ‘whatever he says’). Whereas irrelevance is conveyed by a single element in a fixed position in languages like English (-ever), immer and auch occur in multiple positions and combinations. Following the example of Leuschner (2000), the distribution of particles and their combinations is documented and explained using functional motivations. Compared with Leuschner (2000), however, the present study is based on a much larger sample of 23,299 clauses with the W-words was and wer (incl. their inflected forms) from the DeReKo-corpus, allowing for a far more detailed statistical analysis. Special attention is devoted to the distribution of immer and auch (including their combinations) in full subordinate clauses vs. elliptically reduced forms, and to the nature of the resulting patterns as a case of emergent grammar

    A layered abduction model of perception: Integrating bottom-up and top-down processing in a multi-sense agent

    Get PDF
    A layered-abduction model of perception is presented which unifies bottom-up and top-down processing in a single logical and information-processing framework. The process of interpreting the input from each sense is broken down into discrete layers of interpretation, where at each layer a best explanation hypothesis is formed of the data presented by the layer or layers below, with the help of information available laterally and from above. The formation of this hypothesis is treated as a problem of abductive inference, similar to diagnosis and theory formation. Thus this model brings a knowledge-based problem-solving approach to the analysis of perception, treating perception as a kind of compiled cognition. The bottom-up passing of information from layer to layer defines channels of information flow, which separate and converge in a specific way for any specific sense modality. Multi-modal perception occurs where channels converge from more than one sense. This model has not yet been implemented, though it is based on systems which have been successful in medical and mechanical diagnosis and medical test interpretation
    • …
    corecore