72 research outputs found

    Multi-modal adversarial autoencoders for recommendations of citations and subject labels

    Get PDF
    We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model

    Recommendations for item set completion: On the semantics of item co-occurrence with data sparsity, input size, and input modalities

    Get PDF
    We address the problem of recommending relevant items to a user in order to "complete" a partial set of items already known. We consider the two scenarios of citation and subject label recommendation, which resemble different semantics of item co-occurrence: relatedness for co-citations and diversity for subject labels. We assess the influence of the completeness of an already known partial item set on the recommender performance. We also investigate data sparsity through a pruning parameter and the influence of using additional metadata. As recommender models, we focus on different autoencoders, which are particularly suited for reconstructing missing items in a set. We extend autoencoders to exploit a multi-modal input of text and structured data. Our experiments on six real-world datasets show that supplying the partial item set as input is helpful when item co-occurrence resembles relatedness, while metadata are effective when co-occurrence implies diversity. This outcome means that the semantics of item co-occurrence is an important factor. The simple item co-occurrence model is a strong baseline for citation recommendation. However, autoencoders have the advantage to enable exploiting additional metadata besides the partial item set as input and achieve comparable performance. For the subject label recommendation task, the title is the most important attribute. Adding more input modalities sometimes even harms the result. In conclusion, it is crucial to consider the semantics of the item co-occurrence for the choice of an appropriate recommendation model and carefully decide which metadata to exploit

    Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

    Get PDF
    [...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

    Training researchers with the MOVING platform

    Get PDF
    The MOVING platform enables its users to improve their information literacy by training how to exploit data and text mining methods in their daily research tasks. In this paper, we show how it can support researchers in various tasks, and we introduce its main features, such as text and video retrieval and processing, advanced visualizations, and the technologies to assist the learning process

    Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

    Full text link
    Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods

    Building Up Recommender Systems By Deep Learning For Cognitive Services

    Full text link
    Cognitive services provide artificial intelligence (AI) technology for application developers, who are not required to be experts on machine learning. Cognitive services are presented as an integrated service platform where end users bring abilities such as seeing, hearing, speaking, searching, user profiling, etc. to their own applications under development via simple API calls. As one of the above abilities, recommender systems serve as an indispensable building brick, especially when it comes to the information retrieval functionality in the cognitive service platform. This thesis focuses on the novel recommendation algorithms that are able to improve on recommendation quality measured by accuracy metrics, e.g., precision and recall, with advanced deep learning techniques. Recent deep learning-based recommendation models have been proved to have state-ofthe-art recommendation quality in a host of recommendation scenarios, such as rating prediction tasks, top-N ranking tasks, sequential recommendation, etc. Many of them only leverage the existing information acquired from users’ past behaviours to model them and make one or a set of predictions on the users’ next choice. Such information is normally sparse so that an accurate user behaviour model is often difficult to obtain even with deep learning. To overcome this issue, we invent various adversarial techniques and apply them to deep learning recommendation models in different scenarios. Some of these techniques involve generative models to address data sparsity and some improve user behaviour modelling by introducing an adversarial opponent in model training. We empirically show the effectiveness of our novel techniques and the enhancement achieved over existing models via thorough experiments and ablation studies on widely adopted recommendation datasets. The contributions in this thesis are as follows: 1. Propose the adversarial collaborative auto-encoder model for top-N recommendation; 2. Propose a novel deep domain adaptation cross-domain recommendation model for rating prediction tasks via transfer learning; 3. Propose a novel adversarial noise layer for convolutional neural networks and a convolutional generative adversarial model for top-N recommendation

    Addressing Variability in Speech when Recognizing Emotion and Mood In-the-Wild

    Full text link
    Bipolar disorder is a chronic mental illness, affecting 4% of Americans, that is characterized by periodic mood changes ranging from severe depression to extreme compulsive highs. Both mania and depression profoundly impact the behavior of affected individuals, resulting in potentially devastating personal and social consequences. Bipolar disorder is managed clinically with regular interactions with care providers, who assess mood, energy levels, and the form and content of speech. Recent work has proposed smartphones for automatically monitoring mood using speech. Much of the early work in speech-centered mood detection has been done in the laboratory or clinic and is not reflective of the variability found in real-world conversations and conditions. Outside of these settings, automatic mood detection is hard, as the recordings include environmental noise, differences in recording devices, and variations in subject speaking patterns. Without addressing these issues, it is difficult to move towards a passive mobile health system. My research works to address this variability present in speech so that such a system can be created, allowing for interventions to mitigate the life-changing effects of mood transitions. However detecting mood directly from speech is difficult, as mood varies over the course of days or weeks, while speech fluctuates rapidly. To address this, my thesis explores how an intermediate step can be used to aid in this prediction. For example, one of the major symptoms of bipolar disorder is emotion dysregulation - changes in the way emotions are perceived and a lack of inhibition in their expression. My work has supported the relationship between automatically extracted emotion estimates and mood. Because of this, my thesis explores how to mitigate the variability found when detecting emotion from speech. The remainder of my thesis is focused on employing these emotion-based features, as well as features based on language content, to real-world applications. This dissertation is divided into the following parts: Part I: I address the direct classification of mood from speech. This is accomplished by addressing variability due to recording device using preprocessing and multi-task learning. I then show how both subject-specific and population-general information can be combined to significantly improve mood detection. Part II: I explore the automatic detection of emotion from speech and how to control for the other factors of variability present in the speech signal. I use progressive networks as a method to augment emotion with other paralinguistic data including gender and speaker, as well as other datasets. Additionally, I introduce a novel domain generalization method for cross-corpus detection. Part III: I demonstrate real-world applications of speech mood monitoring using everyday conversations. I show how the previously introduced generalized model can predict emotion from the speech of individuals with suicidal ideation, demonstrating its effectiveness across domains. Furthermore, I use these predictions to distinguish individuals with suicidal thoughts from healthy controls. Lastly, I introduce a novel framework for intervention detection in individuals with bipolar disorder. I then create a natural speech mood monitoring system based on features derived from measures of emotion and automatic speech recognition (ASR) transcripts and show effective intervention detection. I conclude this dissertation with the following future directions: (1) Extending my emotion generalization system to include multiple modalities and factors of variability; (2) Expanding natural speech mood monitoring by including more devices, exploring other data besides speech, and investigating mood rating causality.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153461/1/gideonjn_1.pd

    Fifteenth Biennial Status Report: March 2019 - February 2021

    Get PDF
    • …
    corecore