12 research outputs found

    Sentiment Analysis: An Overview from Linguistics

    Get PDF
    Sentiment analysis is a growing field at the intersection of linguistics and computer science, which attempts to automatically determine the sentiment, or positive/negative opinion, contained in text. Sentiment can be characterized as positive or negative evaluation expressed through language. Common applications of sentiment analysis include the automatic determination of whether a review posted online (of a movie, a book, or a consumer product) is positive or negative towards the item being reviewed. Sentiment analysis is now a common tool in the repertoire of social media analysis carried out by companies, marketers and political analysts. Research on sentiment analysis extracts information from positive and negative words in text, from the context of those words, and the linguistic structure of the text. This brief survey examines in particular the contributions that linguistic knowledge can make to the problem of automatically determining sentiment

    Subjectivity Analysis In Opinion Mining - A Systematic Literature Review

    Get PDF
    Subjectivity analysis determines existence of subjectivity in text using subjective clues.It is the first task in opinion mining process.The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative.There were many techniques used to solve the problem of segregating subjective and objective text.This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis.SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions.Eight research questions were drawn for this purpose.Information such as technique,corpus,subjective clues representation and performance were extracted from 97 articles known as primary studies.This information was analysed to identify the strengths and weaknesses of the technique,affecting elements to the performance and missing elements from the subjectivity analysis.The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem.The performance of this approach outperformed other approaches though currently it is at satisfactory level.Therefore,more studies are needed to improve the performance of subjectivity analysis

    Cross-lingual sentiment classification using semi-supervised learning

    Get PDF
    Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems

    Unionization method for changing opinion in sentiment classification using machine learning

    Get PDF
    Sentiment classification aims to determine whether an opinionated text expresses a positive, negative or neutral opinion. Most existing sentiment classification approaches have focused on supervised text classification techniques. One critical problem of sentiment classification is that a text collection may contain tens or hundreds of thousands of features, i.e. high dimensionality, which can be solved by dimension reduction approach. Nonetheless, although feature selection as a dimension reduction method can reduce feature space to provide a reduced feature subset, the size of the subset commonly requires further reduction. In this research, a novel dimension reduction approach called feature unionization is proposed to construct a more reduced feature subset. This approach works based on the combination of several features to create a more informative single feature. Another challenge of sentiment classification is the handling of concept drift problem in the learning step. Users’ opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches do not consider the evolution of users’ opinions. They assume that instances are independent, identically distributed and generated from a stationary distribution, even though they are generated from a stream distribution. In this study, a stream sentiment classification method is proposed to deal with changing opinion and imbalanced data distribution using ensemble learning and instance selection methods. In relation to the concept drift problem, another important issue is the handling of feature drift in the sentiment classification. To handle feature drift, relevant features need to be detected to update classifiers. Since proposed feature unionization method is very effective to construct more relevant features, it is further used to handle feature drift. Thus, a method to deal with concept and feature drifts for stream sentiment classification was proposed. The effectiveness of the feature unionization method was compared with the feature selection method over fourteen publicly available datasets in sentiment classification domain using three typical classifiers. The experimental results showed the proposed approach is more effective than current feature selection approaches. In addition, the experimental results showed the effectiveness of the proposed stream sentiment classification method in comparison to static sentiment classification. The experiments conducted on four datasets, have successfully shown that the proposed algorithm achieved better results and proving the effectiveness of the proposed method

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Compositional language processing for multilingual sentiment analysis

    Get PDF
    Programa Oficial de Doutoramento en Computación. 5009V01[Abstract] This dissertation presents new approaches in the field of sentiment analysis and polarity classification, oriented towards obtaining the sentiment of a phrase, sentence or document from a natural language processing point of view. It makes a special emphasis on methods to handle semantic composionality, i. e. the ability to compound the sentiment of multiword phrases, where the global sentiment might be different or even opposite to the one coming from each of their their individual components; and the application of these methods to multilingual scenarios. On the one hand, we introduce knowledge-based approaches to calculate the semantic orientation at the sentence level, that can handle different phenomena for the purpose at hand (e. g. negation, intensification or adversative subordinate clauses). On the other hand, we describe how to build machine learning models to perform polarity classification from a different perspective, combining linguistic (lexical, syntactic and semantic) knowledge, with an emphasis in noisy and micro-texts. Experiments on standard corpora and international evaluation campaigns show the competitiveness of the methods here proposed, in monolingual, multilingual and code-switching scenarios. The contributions presented in the thesis have potential applications in the era of the Web 2.0 and social media, such as being able to determine what is the view of society about products, celebrities or events, identify their strengths and weaknesses or monitor how these opinions evolve over time. We also show how some of the proposed models can be useful for other data analysis tasks.[Resumen] Esta tesis presenta nuevas técnicas en el ámbito del análisis del sentimiento y la clasificación de polaridad, centradas en obtener el sentimiento de una frase, oración o documento siguiendo enfoques basados en procesamiento del lenguaje natural. En concreto, nos centramos en desarrollar métodos capaces de manejar la semántica composicional, es decir, con la capacidad de componer el sentimiento de oraciones donde la polaridad global puede ser distinta, o incluso opuesta, de la que se obtendría individualmente para cada uno de sus términos; y cómo dichos métodos pueden ser aplicados en entornos multilingües. En la primera parte de este trabajo, introducimos aproximaciones basadas en conocimiento para calcular la orientación semántica a nivel de oración, teniendo en cuenta construcciones lingüísticas relevantes en el ámbito que nos ocupa (por ejemplo, la negación, intensificación, o las oraciones subordinadas adversativas). En la segunda parte, describimos cómo construir clasificadores de polaridad basados en aprendizaje automático que combinan información léxica, sintáctica y semántica; centrándonos en su aplicación sobre textos cortos y de pobre calidad gramatical. Los experimentos realizados sobre colecciones estándar y competiciones de evaluación internacionales muestran la efectividad de los métodos aquí propuestos en entornos monolingües, multilingües y de code-switching. Las contribuciones presentadas en esta tesis tienen diversas aplicaciones en la era de la Web 2.0 y las redes sociales, como determinar la opinión que la sociedad tiene sobre un producto, celebridad o evento; identificar sus puntos fuertes y débiles o monitorizar cómo estas opiniones evolucionan a lo largo del tiempo. Por último, también mostramos cómo algunos de los modelos propuestos pueden ser útiles para otras tareas de análisis de datos.[Resumo] Esta tese presenta novas técnicas no ámbito da análise do sentimento e da clasificación da polaridade, orientadas a obter o sentimento dunha frase, oración ou documento seguindo aproximacións baseadas no procesamento da linguaxe natural. En particular, centrámosnos en métodos capaces de manexar a semántica composicional: métodos coa habilidade para compor o sentimento de oracións onde o sentimento global pode ser distinto, ou incluso oposto, do que se obtería individualmente para cada un dos seus términos; e como ditos métodos poden ser aplicados en entornos multilingües. Na primeira parte da tese, introducimos aproximacións baseadas en coñecemento; para calcular a orientación semántica a nivel de oración, tendo en conta construccións lingüísticas importantes no ámbito que nos ocupa (por exemplo, a negación, a intensificación ou as oracións subordinadas adversativas). Na segunda parte, describimos como podemos construir clasificadores de polaridade baseados en aprendizaxe automática e que combinan información léxica, sintáctica e semántica, centrándonos en textos curtos e de pobre calidade gramatical. Os experimentos levados a cabo sobre coleccións estándar e competicións de avaliación internacionais mostran a efectividade dos métodos aquí propostos, en entornos monolingües, multilingües e de code-switching. As contribucións presentadas nesta tese teñen diversas aplicacións na era da Web 2.0 e das redes sociais, como determinar a opinión que a sociedade ten sobre un produto, celebridade ou evento; identificar os seus puntos fortes e febles ou monitorizar como esas opinións evolucionan o largo do tempo. Como punto final, tamén amosamos como algúns dos modelos aquí propostos poden ser útiles para outras tarefas de análise de datos
    corecore