10 research outputs found

    SMILE : Twitter emotion classification using domain adaptation

    Get PDF
    Despite the widely spread research interest in social media sentiment analysis, sentiment and emotion classification across different domains and on Twitter data remains a challenging task. Here we set out to find an effective approach for tackling a cross-domain emotion classification task on a set of Twitter data involving social media discourse around arts and cultural experiences, in the context of museums. While most existing work in domain adaptation has focused on feature-based or/and instance-based adaptation methods, in this work we study a model-based adaptive SVM approach as we believe its flexibility and efficiency is more suitable for the task at hand. We conduct a series of experiments and compare our system with a set of baseline methods. Our results not only show a superior performance in terms of accuracy and computational efficiency compared to the baselines, but also shed light on how different ratios of labelled target-domain data used for adaptation can affect classification performance

    Multimodal Sentiment Analysis Based on Deep Learning: Recent Progress

    Get PDF
    Multimodal sentiment analysis is an important research topic in the field of NLP, aiming to analyze speakers\u27 sentiment tendencies through features extracted from textual, visual, and acoustic modalities. Its main methods are based on machine learning and deep learning. Machine learning-based methods rely heavily on labeled data. But deep learning-based methods can overcome this shortcoming and capture the in-depth semantic information and modal characteristics of the data, as well as the interactive information between multimodal data. In this paper, we survey the deep learning-based methods, including fusion of text and image and fusion of text, image, audio, and video. Specifically, we discuss the main problems of these methods and the future directions. Finally, we review the work of multimodal sentiment analysis in conversation

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi

    The Camera in conservation: determining photography’s place in the preservation of wildlife

    Get PDF
    This MA by research study is a reflection of photography’s past, current and future role within wildlife conservation, or whether there is indeed a necessity for it moving forwards. The following investigation and analysis of photography seeks to materialise how in fact the photographic medium can be both beneficial and negatively impactful to the preservation of wildlife, and how best it can be used by photographers in future conservation projects to ensure the preservation of wildlife. Several significant aspects of photography and external influences are engaged with in this study, firstly investigating the importance of empathy within wildlife conservation and how it can be elicited through imagery and photographic methods. Furthermore, I investigate the other side of conservation photography’s success, analysing what negative or neutral impacts it can bring with it, before researching the role that social media does and has the potential to play in conservation, and how photography can adapt to it to maximise its success. Lastly, I explore alternative visual media such as moving image, and how photography can best applicate successful techniques learned from them to reinterpret how conservation photography is perceived. Finally, using information and research from across my thesis, I have produced a ‘guide’ as to how conservation photography can be shaped to achieve its full potential for success, drawing upon previous successes and failures of other conservation attempts and photographers

    Modelo computacional e sua implementação para identificação de perfil de personalidade baseado em textos educacionais

    Get PDF
    Orientador: Prof. Dr. Andrey Ricardo PimentelTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 14/09/2018Inclui referências: p.132-144Área de concentração: Ciência da ComputaçãoResumo: A identificacao do perfil de personalidade de alunos, levando em consideracao as diferencas, colabora com os educadores no processo de encontrar situacoes de aprendizagem adequadas para cada aluno. Este processo pode ser realizado de forma intuitiva em pequenas turmas presenciais, mas apresenta-se como um grande desafio no cenario de grandes turmas em ambientes a distancia. Uma das formas de identificar o perfil de personalidade e a utilizacao dos inventarios de personalidade, nos quais os alunos respondem a uma serie de perguntas que sao posteriormente avaliadas, gerando os indicadores de perfil de personalidade de acordo com um modelo especifico. Em contrapartida a esses metodos manuais de aplicacao de inventarios, tem-se desenvolvido metodos nao intrusivos, baseados, por exemplo, na identificacao das pistas de personalidade registradas pelos individuos nos textos por estes produzidos. Com a utilizacao de processos de aprendizado de maquina, as pistas identificadas nos textos podem ser comparadas as pistas identificadas em bases de dados de referencia, nas quais um processo previo de identificacao manual foi realizado, inferindo-se assim o perfil de personalidade dos autores dos textos. Esta pesquisa apresenta um modelo, denominado IP3, que permite a realizacao da identificacao automatica do perfil de personalidade de alunos, de uma forma nao intrusiva, tendo como referencia somente o texto em portugues registrado por estes alunos em atividades educacionais. Este modelo e baseado em aprendizado de maquina, utilizando bases de aprendizado previamente rotuladas, modelos de representacao do texto e tecnicas de classificacao. Como base de treinamento e referencia para os testes dos classificadores, foram utilizadas as bases ESSAYS e myPersonality, bases estas utilizadas por diversas pesquisas na area de identificacao de personalidade a partir do texto. Para a representacao do texto foi utilizado o lexico LIWC, bem como a representacao estatistica nos modelos n-gram e Word2Vec. Tambem foram avaliadas as tecnicas utilizadas para classificacao de texto, sendo proposta a utilizacao da estrategia de combinacao de classificadores. Com o objetivo de validar o modelo apresentado, foi realizado um experimento pratico em um ambiente educacional. Os resultados apresentados demonstram a viabilidade da utilizacao do modelo IP3 para identificacao do perfil de personalidade dos alunos baseado somente nos textos registrados em ambientes educacionais. Palavras-chave: identificacao de personalidade, classificacao de texto, representacao de texto, processamento de linguagem natural, aprendizado de maquina.Abstract: The personality profile identification of students supports educators in the process of finding suitable learning conditions for each student while considering their differences. Although this process can be used in an intuitive way for small groups in classroom learning, it proves to be a significant challenge in the landscape of large distance learning groups. One way of identifying the personality profile is by using the personality inventory. By using this method, students answer a series of questions that are later evaluated, generating personality profile indicators according to a specific model. In contrast to that, we can find the use of non-intrusives methods. They are based on the identification of personality clues which can be derived from the text produced by individuals. With the use of machine learning processes, these clues are identified within the text and can be compared to the clues found in databases of reference, in which an inference of a personality profile has been identified, through a previous manual identification process. This research had the purpose of obtaining a model, named IP3, that allows the identification of students' profile in a non-intrusive way. It considered only text in portuguese produced by these students in their educational activities. To conduct this research, the author investigated text representation techniques that allowed to obtain clues about the writer. The methods used in this research were the LIWC lexicon as well as the statistic representation in the n-gram and Word2Vec models. Additionally, the classification and the classifiers combination specification techniques were also evaluated in the proposed model. As a training basis and reference for the classifiers' tests, ESSAYS and myPersonality databases have been used, which are commonly used by several researchers in the field of personality identification from text. To validate the model presented, a practical experiment was conducted in an educational environment. The presented results indicate the viability regarding the use of the IP3 student's personality profile identification model, based on the text produced by them during educational activities. Keywords: personality recognition, text classification, text representation, natural language processing, machine learning

    Information models in sentiment analysis based on linguistic resources

    Get PDF
    Почетак новог миленијума обележен је бурним развојем друштвених мрежа, интернет технологијама у облаку и применом вештачке интелигенције у веб алатима. Изузетно брз раст броја текстова на интернету (блогова, сајтова за електронску трговину, форума, дискусионих група, система за пренос кратких порука, друштвених мрежа и портала за објаву вести) увећао је потребу за развојем метода брзе, свеобухватне и прецизне анализе текста. Због тога је значајан развој језичких технологија чији су примарни задаци: класификација докумената (енг. Document classification), груписање докумената (енг. Document clustering), проналажење информација (енг. Information Retrieval), разрешавање значења вишезначних речи (енг. Word-sense disambiguation), екстракција из текста (енг. Text еxtraction), машинско превођење (енг. Machine translation), рачунарско препознавање говора (енг. Computer speech recognition), генерисање природног језика (енг. Natural language generation), анализа осећања (енг. sentiment analysis), итд. У рачунарској лингвистици данас је у употреби више различитих назива за област чији је предмет интересовања обрада осећања у тексту: класификација према осећању (енг. sentiment classification), истраживање мишљење (енг. opinion mining), анализа осећања (енг. sentiment analysis), екстракција осећања (енг. sentiment extraction). По својој природи и методама које користи, анализа осећања у тексту спада у област рачунарске лингвистике која се бави класификацијом текста. У процесу обраде осећања се, у општем случају, говори о три врсте класификације текстова:...The beginning of the new millennium was marked by huge development of social networks, internet technologies in the cloud and applications of artificial intelligence tools on the web. Extremely rapid growth in the number of articles on the Internet (blogs, e-commerce websites, forums, discussion groups, and systems for transmission of short messages, social networks and portals for publishing news) has increased the need for developing methods of rapid, comprehensive and accurate analysis of the text. Therefore, remarkable development of language technologies has enabled their applying in processes of document classification, document clustering, information retrieval, word sense disambiguation, text extraction, machine translation, computer speech recognition, natural language generation, sentiment analysis, etc. In computational linguistics, several different names for the area concerning processing of emotions in text are in use: sentiment classification, opinion mining, sentiment analysis, sentiment extraction. According to the nature and the methods used, sentiment analysis in text belongs to the field of computational linguistics that deals with the classification of text. In the process of analysing of emotions we generally speak of three kinds of text classification:..

    Quantitative Assessment of Factors in Sentiment Analysis

    Get PDF
    Sentiment can be defined as a tendency to experience certain emotions in relation to a particular object or person. Sentiment may be expressed in writing, in which case determining that sentiment algorithmically is known as sentiment analysis. Sentiment analysis is often applied to Internet texts such as product reviews, websites, blogs, or tweets, where automatically determining published feeling towards a product, or service is very useful to marketers or opinion analysts. The main goal of sentiment analysis is to identify the polarity of natural language text. This thesis sets out to examine quantitatively the factors that have an effect on sentiment analysis. The factors that are commonly used in sentiment analysis are text features, sentiment lexica or resources, and the machine learning algorithms employed. The main aim of this thesis is to investigate systematically the interaction between sentiment analysis factors and machine learning algorithms in order to improve sentiment analysis performance as compared to the opinions of human assessors. A software system known as TJP was designed and developed to support this investigation. The research reported here has three main parts. Firstly, the role of data pre-processing was investigated with TJP using a combination of features together with publically available datasets. This considers the relationship and relative importance of superficial text features such as emoticons, n-grams, negations, hashtags, repeated letters, special characters, slang, and stopwords. The resulting statistical analysis suggests that a combination of all of these features achieves better accuracy with the dataset, and had a considerable effect on system performance. Secondly, the effect of human marked up training data was considered, since this is required by supervised machine learning algorithms. The results gained from TJP suggest that training data greatly augments sentiment analysis performance. However, the combination of training data and sentiment lexica seems to provide optimal performance. Nevertheless, one particular sentiment lexicon, AFINN, contributed better than others in the absence of training data, and therefore would be appropriate for unsupervised approaches to sentiment analysis. Finally, the performance of two sophisticated ensemble machine learning algorithms was investigated. Both the Arbiter Tree and Combiner Tree were chosen since neither of them has previously been used with sentiment analysis. The objective here was to demonstrate their applicability and effectiveness compared to that of the leading single machine learning algorithms, Naïve Bayes, and Support Vector Machines. The results showed that whilst either can be applied to sentiment analysis, the Arbiter Tree ensemble algorithm achieved better accuracy performance than either the Combiner Tree or any single machine learning algorithm
    corecore