209 research outputs found

    On the Complementarity of Images and Text for the Expression of Emotions in Social Media

    Full text link
    Authors of posts in social media communicate their emotions and what causes them with text and images. While there is work on emotion and stimulus detection for each modality separately, it is yet unknown if the modalities contain complementary emotion information in social media. We aim at filling this research gap and contribute a novel, annotated corpus of English multimodal Reddit posts. On this resource, we develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class. We evaluate if these tasks require both modalities and find for the image-text relations, that text alone is sufficient for most categories (complementary, illustrative, opposing): the information in the text allows to predict if an image is required for emotion understanding. The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise. Stimuli depicted by objects, animals, food, or a person are best predicted by image-only models, while multimodal models are most effective on art, events, memes, places, or screenshots.Comment: accepted for WASSA 2022 at ACL 202

    Visually Grounded Meaning Representations

    Get PDF
    In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes representing more than 500 concepts and 700K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We evaluate our model on its ability to simulate word similarity judgments and concept categorization. On both tasks, our model yields a better fit to behavioral data compared to baselines and related models which either rely on a single modality or do not make use of attribute-based input

    Learning visually grounded meaning representations

    Get PDF
    Humans possess a rich semantic knowledge of words and concepts which captures the perceivable physical properties of their real-world referents and their relations. Encoding this knowledge or some of its aspects is the goal of computational models of semantic representation and has been the subject of considerable research in cognitive science, natural language processing, and related areas. Existing models have placed emphasis on different aspects of meaning, depending ultimately on the task at hand. Typically, such models have been used in tasks addressing the simulation of behavioural phenomena, e.g., lexical priming or categorisation, as well as in natural language applications, such as information retrieval, document classification, or semantic role labelling. A major strand of research popular across disciplines focuses on models which induce semantic representations from text corpora. These models are based on the hypothesis that the meaning of words is established by their distributional relation to other words (Harris, 1954). Despite their widespread use, distributional models of word meaning have been criticised as ‘disembodied’ in that they are not grounded in perception and action (Perfetti, 1998; Barsalou, 1999; Glenberg and Kaschak, 2002). This lack of grounding contrasts with many experimental studies suggesting that meaning is acquired not only from exposure to the linguistic environment but also from our interaction with the physical world (Landau et al., 1998; Bornstein et al., 2004). This criticism has led to the emergence of new models aiming at inducing perceptually grounded semantic representations. Essentially, existing approaches learn meaning representations from multiple views corresponding to different modalities, i.e. linguistic and perceptual input. To approximate the perceptual modality, previous work has relied largely on semantic attributes collected from humans (e.g., is round, is sour), or on automatically extracted image features. Semantic attributes have a long-standing tradition in cognitive science and are thought to represent salient psychological aspects of word meaning including multisensory information. However, their elicitation from human subjects limits the scope of computational models to a small number of concepts for which attributes are available. In this thesis, we present an approach which draws inspiration from the successful application of attribute classifiers in image classification, and represent images and the concepts depicted by them by automatically predicted visual attributes. To this end, we create a dataset comprising nearly 700K images and a taxonomy of 636 visual attributes and use it to train attribute classifiers. We show that their predictions can act as a substitute for human-produced attributes without any critical information loss. In line with the attribute-based approximation of the visual modality, we represent the linguistic modality by textual attributes which we obtain with an off-the-shelf distributional model. Having first established this core contribution of a novel modelling framework for grounded meaning representations based on semantic attributes, we show that these can be integrated into existing approaches to perceptually grounded representations. We then introduce a model which is formulated as a stacked autoencoder (a variant of multilayer neural networks), which learns higher-level meaning representations by mapping words and images, represented by attributes, into a common embedding space. In contrast to most previous approaches to multimodal learning using different variants of deep networks and data sources, our model is defined at a finer level of granularity—it computes representations for individual words and is unique in its use of attributes as a means of representing the textual and visual modalities. We evaluate the effectiveness of the representations learnt by our model by assessing its ability to account for human behaviour on three semantic tasks, namely word similarity, concept categorisation, and typicality of category members. With respect to the word similarity task, we focus on the model’s ability to capture similarity in both the meaning and appearance of the words’ referents. Since existing benchmark datasets on word similarity do not distinguish between these two dimensions and often contain abstract words, we create a new dataset in a large-scale experiment where participants are asked to give two ratings per word pair expressing their semantic and visual similarity, respectively. Experimental results show that our model learns meaningful representations which are more accurate than models based on individual modalities or different modality integration mechanisms. The presented model is furthermore able to predict textual attributes for new concepts given their visual attribute predictions only, which we demonstrate by comparing model output with human generated attributes. Finally, we show the model’s effectiveness in an image-based task on visual category learning, in which images are used as a stand-in for real-world objects

    Kundenkenntnis im Handel

    Get PDF

    Kundenkenntnis im Handel - Ausprägungen, Herkunft und Wirkungen

    Get PDF
    Die Bedeutung der Kundenkenntnis für den Markterfolg der Anbieter ist unbestritten. Dennoch wissen wir wenig darüber, wie gut die Anbieter ihre Kunden kennen. Dies gilt auch für den Handel. Deshalb befasst sich die hier vorgelegte Publikation mit der Ausprägung der Kundenkenntnis im Handel, aber auch mit den Quellen und Determinanten dieser Kenntnis und mit ihren Auswirkungen auf der Anbieter- und auf der Kundenseite. Beachtung finden alle möglichen Quellen der Kundenkenntnis, vor allem aber die Kundenkontakte. Im Einzelnen werden nicht nur relevante Theorien bemüht, sondern auch erste, differenzierte Forschungsergebnisse zur Ausprägung der Kundenkenntnis, zu den Einflussfaktoren und den Auswirkungen dieser Kenntnis präsentiert. Folgerungen für die künftige Forschung und für das Wissensmanagement im Handel schließen das Werk ab

    Grounded Models of Semantic Representation

    Get PDF
    A popular tradition of studying semantic representation has been driven by the assumption that word meaning can be learned from the linguistic environment, despite ample evidence suggesting that language is grounded in perception and action. In this paper we present a comparative study of models that represent word meaning based on linguistic and perceptual data. Linguistic information is approximated by naturally occurring corpora and sensorimotor experience by feature norms (i.e., attributes native speakers consider important in describing the meaning of a word). The models differ in terms of the mechanisms by which they integrate the two modalities. Experimental results show that a closer correspondence to human data can be obtained by uncovering latent information shared among the textual and perceptual modalities rather than arriving at semantic knowledge by concatenating the two.

    Models of Semantic Representation with Visual Attributes

    Get PDF
    We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

    Entwicklung eines elektrischen Carsharing-Angebots für den ländlichen Raum

    Get PDF
    Carsharing hat deutschlandweit das Potenzial, einen Beitrag zur Verringerung der Treibhausgase im Tansportsektor leisten zu können. Dabei ist es fraglich, ob sich erfolgreiche Carsharing-Angebote auch für den ländlichen Raum entwickeln lassen. Um herauszufinden, welche besonderen Herausforderungen aus Sicht der Nutzerinnen und Nutzer im ländlichen Raum bestehen, wurde in einem Partizipationsprozess mit Bürgerinnen und Bürger einer Ortschaft im ländlichen Raum ein elektrisches Carsharing-Angebot entwickelt. Wichtige Einflussfaktoren für die Nutzung des Carsharing-Angebots waren laut einer Umfrage mit 190 Bürgerinnen und Bürgern die Nützlichkeit im Alltag, der Spaß an der Nutzung und die Erreichbarkeit des Standorts. Qualitative Interviews mit 21 Bürgerinnen und Bürgern bestätigten diese Ergebnisse und lieferten Details zu deren Hintergründen. Zudem wurde die Fokussierung auf bestimmte Zielgruppen als wichtig gesehen. In anschließenden Workshops mit Bürgerinnen und Bürgern wurden darauf aufbauend konkrete Ideen für passende Carsharing-Modelle entwickelt. Dabei wurde unter anderem eine App mit einer Funktion zum Angebot von Mitfahrgelegenheiten gewünscht, um das Gemeinschaftsgefühl bei der Nutzung des Carsharing zu fördern. Insgesamt zeigt sich, dass es einige spezielle Anforderungen der Bürgerinnen und Bürger im ländlichen Raum hinsichtlich eines Carsharing-Angebots gibt, denen vor allem eine soziale Ausrichtung gemein ist. Um die größten Erfolgschancen mit einem Carsharing-Angebot im ländlichen Raum zu haben, sollten diese Wünsche mit maßgeschneiderten Lösungen adressiert werden
    corecore