411 research outputs found

    Sentiment Analysis Using Machine Learning Techniques

    Get PDF
    Before buying a product, people usually go to various shops in the market, query about the product, cost, and warranty, and then finally buy the product based on the opinions they received on cost and quality of service. This process is time consuming and the chances of being cheated by the seller are more as there is nobody to guide as to where the buyer can get authentic product and with proper cost. But now-a-days a good number of persons depend upon the on-line market for buying their required products. This is because the information about the products is available from multiple sources; thus it is comparatively cheap and also has the facility of home delivery. Again, before going through the process of placing order for any product, customers very often refer to the comments or reviews of the present users of the product, which help them take decision about the quality of the product as well as the service provided by the seller. Similar to placing order for products, it is observed that there are quite a few specialists in the field of movies, who go though the movie and then finally give a comment about the quality of the movie, i.e., to watch the movie or not or in five-star rating. These reviews are mainly in the text format and sometimes tough to understand. Thus, these reports need to be processed appropriately to obtain some meaningful information. Classification of these reviews is one of the approaches to extract knowledge about the reviews. In this thesis, different machine learning techniques are used to classify the reviews. Simulation and experiments are carried out to evaluate the performance of the proposed classification methods. It is observed that a good number of researchers have often considered two different review datasets for sentiment classification namely aclIMDb and Polarity dataset. The IMDb dataset is divided into training and testing data. Thus, training data are used for training the machine learning algorithms and testing data are used to test the data based on the training information. On the other hand, polarity dataset does not have separate data for training and testing. Thus, k-fold cross validation technique is used to classify the reviews. Four different machine learning techniques (MLTs) viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), and Linear Discriminant Analysis (LDA) are used for the classification of these movie reviews. Different performance evaluation parameters are used to evaluate the performance of the machine learning techniques. It is observed that among the above four machine learning algorithms, RF technique yields the classification result, with more accuracy. Secondly, n-gram based classification of reviews are carried out on the aclIMDb dataset..

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Selecting and Generating Computational Meaning Representations for Short Texts

    Full text link
    Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd

    Utilizing microblogs for improving automatic news high-lights extraction

    Get PDF

    The Best Explanation:Beyond Right and Wrong in Question Answering

    Get PDF

    Three Essays on Opinion Mining of Social Media Texts

    Get PDF
    This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method significantly outperforms existing sentiment lexicons in sentiment classification. As sentiment analysis is being applied to different types of documents to solve different problems, the proposed method provides a useful tool to improve sentiment classification. The second essay focuses on improving supervised sentiment classification. In previous work on sentiment classification, a document was typically represented as a collection of single words. This method of feature representation suffers from severe ambiguity, especially in classifying short texts, such as microblog messages. I propose the use of dependency features in sentiment classification. A dependency describes the relationship between a pair of words even when they are distant. I compare the sentiment classification performance of dependency features with a few commonly used features in different experiment settings. The results show that dependency features significantly outperform existing feature representations. In the third essay, I examine the relationship between social media sentiment and stock returns. This is the first study to test the bidirectional effects in this relationship. Based on theories in behavioral finance research, I speculate that social media sentiment does not predict stock return, but rather that stock return predicts social media sentiment. I empirically test a set of research hypotheses by applying the vector autoregression (VAR) model on a social media data set, which is much larger than those used in previous studies. The hypotheses are supported by the results. The findings have significant implications for both theory and practice

    Optimizing scientific communication : the role of relative clauses as markers of complexity in English and German scientific writing between 1650 and 1900

    Get PDF
    The aim of this thesis is to show that both scientific English and German have become increasingly optimized for scientific communication from 1650 to 1900 by adapting the usage of relative clauses as markers of grammatical complexity. While the lexico-grammatical changes in terms of features and their frequency distribution in scientific writing during this period are well documented, in the present work we are interested in the underlying factors driving these changes and how they affect efficient scientific communication. As the scientific register emerges and evolves, it continuously adapts to the changing communicative needs posed by extra-linguistic pressures arising from the scientific community and its achievements. We assume that, over time, scientific language maintains communicative efficiency by balancing lexico-semantic expansion with a reduction in (lexico-)grammatical complexity on different linguistic levels. This is based on the idea that linguistic complexity affects processing difficulty and, in turn, communicative efficiency. To achieve optimization, complexity is adjusted on the level of lexico-grammar, which is related to expectation-based processing cost, and syntax, which is linked to working memory-based processing cost. We conduct five corpus-based studies comparing English and German scientific writing to general language. The first two investigate the development of relative clauses in terms of lexico-grammar, measuring the paradigmatic richness and syntagmatic predictability of relativizers as indicators of expectation-based processing cost. The results confirm that both levels undergo a reduction in complexity over time. The other three studies focus on the syntactic complexity of relative clauses, investigating syntactic intricacy, locality, and accessibility. Results show that intricacy and locality decrease, leading to lower grammatical complexity and thus mitigating memory-based processing cost. However, accessibility is not a factor of complexity reduction over time. Our studies reveal a register-specific diachronic complexity reduction in scientific language both in lexico-grammar and syntax. The cross-linguistic comparison shows that English is more advanced in its register-specific development while German lags behind due to a later establishment of the vernacular as a language of scientific communication.This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 110

    Extremism Arabic Text Detection using Rough Set Theory: Designing a Novel Approach

    Get PDF
    The linguistics related research and particularly, sentiment analysis using data-driven approaches, has been growing in recent years. However, the large number of users and excessive amount of information available on social media, make it difficult to detect extremism text on these platforms. The literature revealed a plethora of research studies focusing the sentiment analysis primarily, for English texts, however, very limited studies are available concerning the Arabic language which is the 4th mostly spoken language in the world. We first time in this study, propose a text detection mechanism for extremism orientations distinction in Arabic language, to improve the comprehension of subjective phrases. The study introduces a novel method based on Rough Set theory to enhance the accuracy of selected models and recognize text orientation reliably. Experimental outcomes indicate that the proposed method outperforms existing algorithms by contributing towards feature discriminations. Our method achieved 90.853%, 81.707% and 71.951% accuracies for unigram, bigram, and trigram representations, respectively. This study significantly contributes to the limited research in the field of machine learning and linguistics in Arabic language
    corecore