654 research outputs found

    Role of sentiment classification in sentiment analysis: a survey

    Get PDF
    Through a survey of literature, the role of sentiment classification in sentiment analysis has been reviewed. The review identifies the research challenges involved in tackling sentiment classification. A total of 68 articles during 2015 – 2017 have been reviewed on six dimensions viz., sentiment classification, feature extraction, cross-lingual sentiment classification, cross-domain sentiment classification, lexica and corpora creation and multi-label sentiment classification. This study discusses the prominence and effects of sentiment classification in sentiment evaluation and a lot of further research needs to be done for productive results

    A Pointillism Approach for Natural Language Processing of Social Media

    Get PDF
    Natural language processing tasks typically start with the basic unit of words, and then from words and their meanings a big picture is constructed about what the meanings of documents or other larger constructs are in terms of the topics discussed. Social media is very challenging for natural language processing because it challenges the notion of a word. Social media users regularly use words that are not in even the most comprehensive lexicons. These new words can be unknown named entities that have suddenly risen in prominence because of a current event, or they might be neologisms newly created to emphasize meaning or evade keyword filtering. Chinese social media is particularly challenging. The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. Thus, even knowing what the boundaries of words are in a social media corpus is a difficult proposition. For these reasons, in this document I propose the Pointillism approach to natural language processing. In the pointillism approach, language is viewed as a time series, or sequence of points that represent the grams\u27 usage over time. Time is an important aspect of the Pointillism approach. Detailed timing information, such as timestamps of when posts were posted, contain correlations based on human patterns and current events. This timing information provides the necessary context to build words and phrases out of trigrams and then group those words and phrases into topical clusters. Rather than words that have individual meanings, the basic unit of the pointillism approach is trigrams of characters. These grams take on meaning in aggregate when they appear together in a way that is correlated over time. I anticipate that the pointillism approach can perform well in a variety of natural language processing tasks for many different languages, but in this document my focus is on trend analysis for Chinese microblogging. Microblog posts have a timestamp of when posts were posted, that is accurate to the minute or second (though, in this dissertation, I bin posts by the hour). To show that trigrams supplemented with frequency information do collect scattered information into meaningful pieces, I first use the pointillism approach to extract phrases. I conducted experiments on 4-character idioms, a set of 500 phrases that are longer than 3 characters taken from the Chinese-language version of Wiktionary, and also on Weibo\u27s hot keywords. My results show that when words and topics do have a meme-like trend, they can be reconstructed from only trigrams. For example, for 4-character idioms that appear at least 99 times in one day in my data, the unconstrained precision (that is, precision that allows for deviation from a lexicon when the result is just as correct as the lexicon version of the word or phrase) is 0.93. For longer words and phrases collected from Wiktionary, including neologisms, the unconstrained precision is 0.87. I consider these results to be very promising, because they suggest that it is feasible for a machine to reconstruct complex idioms, phrases, and neologisms with good precision without any notion of words. Next, I examine the potential of the pointillism approach for extracting topical trends from microblog posts that are related to environmental issues. Independent Component Analysis (ICA) is utilized to find the trigrams which have the same independent signal source, i.e., topics. Contrast this with probabilistic topic models, which leverage co-occurrence to classify the documents into the topics they have learned, so it is hard for it to extract topics in real-time. However, pointillism approach can extract trends in real-time, whether those trends have been discussed before or not. This is more challenging because in phrase extraction, order information is used to narrow down the candidates, whereas for trend extraction only the frequency of the trigrams are considered. The proposed approach is compared against a state of the art topic extraction technique, Latent Dirichlet Allocation (LDA), on 9,147 labelled posts with timestamps. The experimental results show that the highest F1 score of the pointillism approach with ICA is 4% better than that of LDA. Thus, using the pointillism approach, the colorful and baroque uses of language that typify social media in challenging languages such as Chinese may in fact be accessible to machines. The thesis that my dissertation tests is this: For topic extraction for scenarios where no adequate lexicon is available, such as social media, the Pointillism approach uses timing information to out-perform traditional techniques that are based on co-occurrence

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    A prior case study of natural language processing on different domain

    Get PDF
    In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model

    Quantitative Perspectives on Fifty Years of the Journal of the History of Biology

    Get PDF
    Journal of the History of Biology provides a fifty-year long record for examining the evolution of the history of biology as a scholarly discipline. In this paper, we present a new dataset and preliminary quantitative analysis of the thematic content of JHB from the perspectives of geography, organisms, and thematic fields. The geographic diversity of authors whose work appears in JHB has increased steadily since 1968, but the geographic coverage of the content of JHB articles remains strongly lopsided toward the United States, United Kingdom, and western Europe and has diversified much less dramatically over time. The taxonomic diversity of organisms discussed in JHB increased steadily between 1968 and the late 1990s but declined in later years, mirroring broader patterns of diversification previously reported in the biomedical research literature. Finally, we used a combination of topic modeling and nonlinear dimensionality reduction techniques to develop a model of multi-article fields within JHB. We found evidence for directional changes in the representation of fields on multiple scales. The diversity of JHB with regard to the representation of thematic fields has increased overall, with most of that diversification occurring in recent years. Drawing on the dataset generated in the course of this analysis, as well as web services in the emerging digital history and philosophy of science ecosystem, we have developed an interactive web platform for exploring the content of JHB, and we provide a brief overview of the platform in this article. As a whole, the data and analyses presented here provide a starting-place for further critical reflection on the evolution of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table

    A Survey on Sentiment Mining

    Get PDF
    In past days before putting money into any product people used to ask judgment to their family, friend circle and colleagues and then they take the decision. In today’s world there is a boom of World Wide Web, enormous amount of data is available on internet so while purchasing a product instead of asking to people customer take decisions by analyzing electronic text. As the growth of e-commerce crowds of people encouraged to write their opinion about numerous merchandise in the form of statements/comments on countless sites like facebook,flipkart,snapdeal,amazon,bloggres,twiter,etc.This comments are the sentiments about the services expressed by users and they are categorized into positive, negative and neutral. Different techniques are use for summarizing reviews like Information Retrieval, Text Mining Text Classification, Data Mining, and Text Summarizing. Countless people write their sentiments on plenty of sites. These comments are written in random order so it may cause trouble in usefulness of the information. If someone wants to find out the impact of the usability of any product then he has to manually read all the sentiments and then classify it, which is practically burdensome task. Sentiment mining is playing major role in data mining; it is also referred as sentiment analysis. This field helps to analyze and classify the opinion of users. In this paper we will discuss various techniques, applications and challenges face by the sentiment mining

    Opinion Mining and Sentiment Analysis using Bayesian and Neural Networks Approaches

    Get PDF
    Infotehnoloogiad on muutunud suureks osaks meie elust ja praeguseks on raske kujutada ette elu ilma vidinate ja internetita. Sotsiaalmeedia ei ole tänapäeval ainult informatsiooniallikas, vaid lubab kasutajatel ka omavahel suhelda ning jagada üksteisega arvamusi ja kogemusi. Teatud osa sellest infost on subjektiivne ning sisaldab kasutaja seisukohtadega seostuvat informatsiooni. Säärast informatsiooni analüüsides saab sellest eraldada kõige olulisema ning hiljem kasutada saadud informatsiooni analüüsimiseks ja otsuste tegemistes. Esmalt, et informatsiooni sellisel kujul kasutada, on vaja seda mõista ja kategoriseerida. Käesolevas töös õpitakse seisukohtade analüüsimise tehnikaid, et siis säutsudest arvamusi eraldada. Efektiivseks klassifitseerimiseks on oluline rakendada ülesande lahendamiseks algoritme, mis saavad sellega edukalt hakkama. Magistritöö põhieesmärgiks on uurida algoritme, mida saaks kasutada seisukohtade hindamiseks. Teostatakse andmete eeltöötlust ja viiakse läbi mitmeid eksperimente. Klassifitseerijat treenitakse ja testitakse kahe erineva andmekogu peal kasutades kahte erinevat klassifitseerija implementatsiooni, milleks on naiivne Bayes ja konvolutsiooniline närvivõrk. Lisaks arutatakse klassifitseerija efektiivsuse üle ja mis mõju avaldavad sellele andmed, mille peal seda treenitakse.Information technologies have firmly entered our life and it is impossible to imagine our life without gadgets or the Internet. Today, social media is not only a source that broadcasts information to the users, but it allows users to intercommunicate and share their views and experience with each other. Some portion of such data is subjective and contains opinionated information that can be further analyzed to retrieve essential data from it and later use for various purposes for analysis and decision support. In order to use this type of that the first step is to understand it and categorize opinions in the information. Hence, in this dissertation, sentiment analysis techniques are studied in order to retrieve opinions from the tweets. In order to ensure efficient classification, it is important to apply algorithms that perform well on this task. Therefore, the main goal of the thesis is to investigate algorithms that can be applied for the opinion estimation. To that extend, data preprocessing and several experiments are conducted, namely, the classifier is trained and tested on two different datasets with two different classifiers (Naive Bayes and convolutional neural network). In addition, the influence of the training data on the classifier efficiency is discussed