166,502 research outputs found
ΠΠ½ΠΎΠ³ΠΎΠ·Π½Π°ΡΠ½Π°Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠ½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ml-PLSI
In this paper, we describe an approach to multi-label classification of text documents based on probabilistic topic modeling. On the basis of SCTM-ru a topic model has been built with the help of supervised learning. A multi-label classification algorithm is presented. We propose tools for multi-label classification implementing this approach.Π ΡΠ°Π±ΠΎΡΠ΅ ΡΠ°ΡΡΠΌΠΎΡΡΠ΅Π½ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ ΠΊ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π½Π°ΡΠ½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ
Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠ½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ. ΠΠ° Π±Π°Π·Π΅ ΠΊΠΎΡΠΏΡΡΠ° SCTM-ru ΠΏΠΎΡΡΡΠΎΠ΅Π½Π° ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠ°Ρ ΠΌΠΎΠ΄Π΅Π»Ρ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠΌ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Ρ ΡΡΠΈΡΠ΅Π»Π΅ΠΌ, ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½ Π°Π»Π³ΠΎΡΠΈΡΠΌ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π½Π°ΡΠ½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ. ΠΠΏΠΈΡΠ°Π½ ΡΠΎΡΡΠ°Π² ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠ½ΠΎΠ³ΠΎ ΠΏΡΠΎΡΠΎΡΠΈΠΏΠ°, ΡΠ΅Π°Π»ΠΈΠ·ΡΡΡΠ΅Π³ΠΎ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΡΠΉ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄
Identifying personality and topics of social media
Title from PDF of title page viewed January 27, 2020Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 37-39)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2019Twitter and Facebook are the renowned social networking platforms where users post, share, interact and express to the world, their interests, personality, and behavioral information. User-created content on social media can be a source of truth, which is suitable to be consumed for the personality identification of social media users. Personality assessment using the Big 5 personality factor model benefits organizations in identifying potential professionals, future leaders, best-fit candidates for the role, and build effective teams. Also, the Big 5 personality factors help to understand depression symptoms among aged people in primary care. We had hypothesized that understanding the user personality of the social network would have significant benefits for topic modeling of different areas like news, towards understanding community interests, and topics.
In this thesis, we will present a multi-label personality classification of the social media data and topic feature classification model based on the Big 5 model. We have built the Big 5 personality classification model using a Twitter dataset that has defined openness, conscientiousness, extraversion, agreeableness, and neuroticism. In this thesis, we (1) conduct personality detection using the Big 5 model, (2) extract the topics from Facebook and Twitter data based on each personality, (3) analyze the top essential topics, and (4) find the relation between topics and personalities. The personality would be useful to identify what kind of personality, which topics usually talk about in social media. Multi-label classification is done using Multinomial NaΓ―ve Bayes, Logistic Regression, Linear SVC. Topic Modeling is done based on LDA and KATE. Experimental results with Twitter and Facebook data demonstrate that the proposed model has achieved promising results.Introduction -- Background and related work -- Proposed framework -- Results and evaluations -- Conclusion and future wor
ζ·±ε±€ε¦ηΏγ«εΊγ₯γγγγΉγζζ εζγ«ι’γγη η©Ά
Textual emotion recognition (TER) is the process of automatically identifying emotional states in textual expressions. It is a more in-depth analysis than sentiment analysis. Owing to its significant academic and commercial potential, TER has become an essential topic in the field of NLP. Over the past few years, although considerable progress has been conducted in TER, there are still some difficulties and challenges because of the nature of human emotion complexity. This thesis explores emotional information by incorporating external knowledge, learning emotion correlation, and building effective TER architectures. The main contributions of this thesis are summarized as follows:
(1) To make up for the limitation of imbalanced training data, this thesis proposes a multi-stream neural network that incorporates background knowledge for text classification. To better fuse background knowledge into the basal network, different fusion strategies are employed among multi-streams. The experimental results demonstrate that, as the knowledge supplement, the background knowledge-based features can make up for the information neglected or absented in basal text classification network, especially for imbalance corpus.
(2) To realize contextual emotion learning, this thesis proposes a hierarchical network with label embedding. This network hierarchically encodes the given sentence based on its contextual information. Besides, an auxiliary label embedding matrix is trained for emotion correlation learning with an assembled training objective, contributing to final emotion correlation-based prediction. The experimental results show that the proposed method contributes to emotional feature learning and contextual emotion recognition.
(3) To realize multi-label emotion recognition and emotion correlation learning, this thesis proposed a Multiple-label Emotion Detection Architecture (MEDA). MEDA comprises two modules: Multi-Channel Emotion-Specified Feature Extractor (MC-ESFE) and Emotion Correlation Learner (ECorL). MEDA captures underlying emotion-specified features with MC-ESFE module in advance. With underlying features, emotion correlation learning is implemented through an emotion sequence predicter in ECorL module. Furthermore, to incorporate emotion correlation information into model training, multi-label focal loss is proposed for multi-label learning. The proposed model achieved satisfactory performance and outperformed state-of-the-art models on both RenCECps and NLPCC2018 datasets, demonstrating the effectiveness of the proposed method for multi-label emotion detection
Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation
The objective of topic inference in research proposals aims to obtain the
most suitable disciplinary division from the discipline system defined by a
funding agency. The agency will subsequently find appropriate peer review
experts from their database based on this division. Automated topic inference
can reduce human errors caused by manual topic filling, bridge the knowledge
gap between funding agencies and project applicants, and improve system
efficiency. Existing methods focus on modeling this as a hierarchical
multi-label classification problem, using generative models to iteratively
infer the most appropriate topic information. However, these methods overlook
the gap in scale between interdisciplinary research proposals and
non-interdisciplinary ones, leading to an unjust phenomenon where the automated
inference system categorizes interdisciplinary proposals as
non-interdisciplinary, causing unfairness during the expert assignment. How can
we address this data imbalance issue under a complex discipline system and
hence resolve this unfairness? In this paper, we implement a topic label
inference system based on a Transformer encoder-decoder architecture.
Furthermore, we utilize interpolation techniques to create a series of
pseudo-interdisciplinary proposals from non-interdisciplinary ones during
training based on non-parametric indicators such as cross-topic probabilities
and topic occurrence probabilities. This approach aims to reduce the bias of
the system during model training. Finally, we conduct extensive experiments on
a real-world dataset to verify the effectiveness of the proposed method. The
experimental results demonstrate that our training strategy can significantly
mitigate the unfairness generated in the topic inference task.Comment: 19 pages, Under review. arXiv admin note: text overlap with
arXiv:2209.1391
Multi-label Rule Learning
Research on multi-label classification is concerned with developing and evaluating algorithms that learn a predictive model for the automatic assignment of data points to a subset of predefined class labels. This is in contrast to traditional classification settings, where individual data points cannot be assigned to more than a single class. As many practical use cases demand a flexible categorization of data, where classes must not necessarily be mutually exclusive, multi-label classification has become an established topic of machine learning research. Nowadays, it is used for the assignment of keywords to text documents, the annotation of multimedia files, such as images, videos, or audio recordings, as well as for diverse applications in biology, chemistry, social network analysis, or marketing. During the past decade, increasing interest in the topic has resulted in a wide variety of different multi-label classification methods. Following the principles of supervised learning, they derive a model from labeled training data, which can afterward be used to obtain predictions for yet unseen data. Besides complex statistical methods, such as artificial neural networks, symbolic learning approaches have not only been shown to provide state-of-the-art performance in many applications but are also a common choice in safety-critical domains that demand human-interpretable and verifiable machine learning models. In particular, rule learning algorithms have a long history of active research in the scientific community. They are often argued to meet the requirements of interpretable machine learning due to the human-legible representation of learned knowledge in terms of logical statements. This work presents a modular framework for implementing multi-label rule learning methods. It does not only provide a unified view of existing rule-based approaches to multi-label classification, but also facilitates the development of new learning algorithms. Two novel instantiations of the framework are investigated to demonstrate its flexibility. Whereas the first one relies on traditional rule learning techniques and focuses on interpretability, the second one is based on a generalization of the gradient boosting framework and focuses on predictive performance rather than the simplicity of models. Motivated by the increasing demand for highly scalable learning algorithms that are capable of processing large amounts of training data, this work also includes an extensive discussion of algorithmic optimizations and approximation techniques for the efficient induction of rules. As the novel multi-label classification methods that are presented in this work can be viewed as instantiations of the same framework, they can both benefit from most of these principles. Their effectiveness and efficiency are compared to existing baselines experimentally
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces
We combine multi-task learning and semi-supervised learning by inducing a
joint embedding space between disparate label spaces and learning transfer
functions between label embeddings, enabling us to jointly leverage unlabelled
data and auxiliary, annotated datasets. We evaluate our approach on a variety
of sequence classification tasks with disparate label spaces. We outperform
strong single and multi-task baselines and achieve a new state-of-the-art for
topic-based sentiment analysis.Comment: To appear at NAACL 2018 (long
- β¦