166,502 research outputs found

    ΠœΠ½ΠΎΠ³ΠΎΠ·Π½Π°Ρ‡Π½Π°Ρ классификация тСкстовых Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² с использованиСм вСроятностного тСматичСского модСлирования ml-PLSI

    Get PDF
    In this paper, we describe an approach to multi-label classification of text documents based on probabilistic topic modeling. On the basis of SCTM-ru a topic model has been built with the help of supervised learning. A multi-label classification algorithm is presented. We propose tools for multi-label classification implementing this approach.Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ рассмотрСн ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ ΠΊ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π½Π°Ρ‡Π½ΠΎΠΉ классификации тСкстовых Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² Π½Π° основС вСроятностного тСматичСского модСлирования. На Π±Π°Π·Π΅ корпуса SCTM-ru построСна тСматичСская модСль ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠΌ обучСния с ΡƒΡ‡ΠΈΡ‚Π΅Π»Π΅ΠΌ, ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅Π½ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π½Π°Ρ‡Π½ΠΎΠΉ классификации. Описан состав ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½ΠΎΠ³ΠΎ ΠΏΡ€ΠΎΡ‚ΠΎΡ‚ΠΈΠΏΠ°, Ρ€Π΅Π°Π»ΠΈΠ·ΡƒΡŽΡ‰Π΅Π³ΠΎ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π½Ρ‹ΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄

    Identifying personality and topics of social media

    Get PDF
    Title from PDF of title page viewed January 27, 2020Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 37-39)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2019Twitter and Facebook are the renowned social networking platforms where users post, share, interact and express to the world, their interests, personality, and behavioral information. User-created content on social media can be a source of truth, which is suitable to be consumed for the personality identification of social media users. Personality assessment using the Big 5 personality factor model benefits organizations in identifying potential professionals, future leaders, best-fit candidates for the role, and build effective teams. Also, the Big 5 personality factors help to understand depression symptoms among aged people in primary care. We had hypothesized that understanding the user personality of the social network would have significant benefits for topic modeling of different areas like news, towards understanding community interests, and topics. In this thesis, we will present a multi-label personality classification of the social media data and topic feature classification model based on the Big 5 model. We have built the Big 5 personality classification model using a Twitter dataset that has defined openness, conscientiousness, extraversion, agreeableness, and neuroticism. In this thesis, we (1) conduct personality detection using the Big 5 model, (2) extract the topics from Facebook and Twitter data based on each personality, (3) analyze the top essential topics, and (4) find the relation between topics and personalities. The personality would be useful to identify what kind of personality, which topics usually talk about in social media. Multi-label classification is done using Multinomial NaΓ―ve Bayes, Logistic Regression, Linear SVC. Topic Modeling is done based on LDA and KATE. Experimental results with Twitter and Facebook data demonstrate that the proposed model has achieved promising results.Introduction -- Background and related work -- Proposed framework -- Results and evaluations -- Conclusion and future wor

    ζ·±ε±€ε­¦ηΏ’γ«εŸΊγ₯γγƒ†γ‚­γ‚Ήγƒˆζ„Ÿζƒ…εˆ†ζžγ«ι–’γ™γ‚‹η ”η©Ά

    Get PDF
    Textual emotion recognition (TER) is the process of automatically identifying emotional states in textual expressions. It is a more in-depth analysis than sentiment analysis. Owing to its significant academic and commercial potential, TER has become an essential topic in the field of NLP. Over the past few years, although considerable progress has been conducted in TER, there are still some difficulties and challenges because of the nature of human emotion complexity. This thesis explores emotional information by incorporating external knowledge, learning emotion correlation, and building effective TER architectures. The main contributions of this thesis are summarized as follows: (1) To make up for the limitation of imbalanced training data, this thesis proposes a multi-stream neural network that incorporates background knowledge for text classification. To better fuse background knowledge into the basal network, different fusion strategies are employed among multi-streams. The experimental results demonstrate that, as the knowledge supplement, the background knowledge-based features can make up for the information neglected or absented in basal text classification network, especially for imbalance corpus. (2) To realize contextual emotion learning, this thesis proposes a hierarchical network with label embedding. This network hierarchically encodes the given sentence based on its contextual information. Besides, an auxiliary label embedding matrix is trained for emotion correlation learning with an assembled training objective, contributing to final emotion correlation-based prediction. The experimental results show that the proposed method contributes to emotional feature learning and contextual emotion recognition. (3) To realize multi-label emotion recognition and emotion correlation learning, this thesis proposed a Multiple-label Emotion Detection Architecture (MEDA). MEDA comprises two modules: Multi-Channel Emotion-Specified Feature Extractor (MC-ESFE) and Emotion Correlation Learner (ECorL). MEDA captures underlying emotion-specified features with MC-ESFE module in advance. With underlying features, emotion correlation learning is implemented through an emotion sequence predicter in ECorL module. Furthermore, to incorporate emotion correlation information into model training, multi-label focal loss is proposed for multi-label learning. The proposed model achieved satisfactory performance and outperformed state-of-the-art models on both RenCECps and NLPCC2018 datasets, demonstrating the effectiveness of the proposed method for multi-label emotion detection

    Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

    Full text link
    The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task.Comment: 19 pages, Under review. arXiv admin note: text overlap with arXiv:2209.1391

    Multi-label Rule Learning

    Get PDF
    Research on multi-label classification is concerned with developing and evaluating algorithms that learn a predictive model for the automatic assignment of data points to a subset of predefined class labels. This is in contrast to traditional classification settings, where individual data points cannot be assigned to more than a single class. As many practical use cases demand a flexible categorization of data, where classes must not necessarily be mutually exclusive, multi-label classification has become an established topic of machine learning research. Nowadays, it is used for the assignment of keywords to text documents, the annotation of multimedia files, such as images, videos, or audio recordings, as well as for diverse applications in biology, chemistry, social network analysis, or marketing. During the past decade, increasing interest in the topic has resulted in a wide variety of different multi-label classification methods. Following the principles of supervised learning, they derive a model from labeled training data, which can afterward be used to obtain predictions for yet unseen data. Besides complex statistical methods, such as artificial neural networks, symbolic learning approaches have not only been shown to provide state-of-the-art performance in many applications but are also a common choice in safety-critical domains that demand human-interpretable and verifiable machine learning models. In particular, rule learning algorithms have a long history of active research in the scientific community. They are often argued to meet the requirements of interpretable machine learning due to the human-legible representation of learned knowledge in terms of logical statements. This work presents a modular framework for implementing multi-label rule learning methods. It does not only provide a unified view of existing rule-based approaches to multi-label classification, but also facilitates the development of new learning algorithms. Two novel instantiations of the framework are investigated to demonstrate its flexibility. Whereas the first one relies on traditional rule learning techniques and focuses on interpretability, the second one is based on a generalization of the gradient boosting framework and focuses on predictive performance rather than the simplicity of models. Motivated by the increasing demand for highly scalable learning algorithms that are capable of processing large amounts of training data, this work also includes an extensive discussion of algorithmic optimizations and approximation techniques for the efficient induction of rules. As the novel multi-label classification methods that are presented in this work can be viewed as instantiations of the same framework, they can both benefit from most of these principles. Their effectiveness and efficiency are compared to existing baselines experimentally

    Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

    Get PDF
    We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
    • …
    corecore