1,545 research outputs found

    Sentiment analysis of products’ reviews containing English and Hindi texts

    Get PDF
    YesThe online shopping is increasing rapidly because of its convenience to buy from home and comparing products from their reviews written by other purchasers. When people buy a product, they express their emotions about that product in the form of review. In Indian context, it is found that the reviews contain Hindi text along with English. It is also found that most of the Hindi text contains opinionated words like bahut achha, bakbas, pesa wasool etc. We have tried to find out different Hindi texts appearing in product reviews written on Indian E-commerce portals. We have also developed a system which takes all those reviews containing Hindi as well as English texts and find out the sentiment expressed in that review for each attribute of the product as well as a final review of the product

    A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends

    Get PDF
    Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy

    A Comparative Analysis of Opinion Mining and Sentiment Classification in Non-english Languages

    Get PDF
    In the past decade many opinion mining and sentiment classification studies have been carried out for opinions in English. However, the amount of work done for non-English text opinions is very limited.In this review, we investigate opinion mining and sentiment classification studies in three non-English languages to find the classification methods and the efficiency of each algorithm used in these methods. It is found that most of the research conducted for non-English has followed the methods used in the English language with onlylimited usage of language specific properties, such as morphological variations. The application domains seem to be restricted to particular fields and significantly less research has been conducted in cross domains. Keywords—Natural Language processing, Text mining, Machine Learning

    Sentiment Analysis of Product Reviews Containing English and Hindi Texts

    Get PDF

    HSAS: Hindi Subjectivity Analysis System

    Get PDF
    With the development of Web 2.0, we are abundant with the documents expressing user's opinions, attitudes and sentiments in the textual form. This user generated textual content is an important source of information to make sound decisions by the organizations and the government. The textual information can be categorized into two types: facts and opinions. Subjectivity analysis is the automatic extraction of subjective information from the opinions posted by users and divides the content into subjective and objective sentences. Most of the works in subjectivity analysis exists for English language data but with the introduction of unicode standards UTF-8, Hindi language content on the web is growing very rapidly. In this paper, Hindi Subjectivity Analysis System (HSAS) is proposed. It explores two different methods of generating subjectivity lexicon using the available resources in English language and their comparative evaluation in performing the task of subjectivity analysis at the sentence level. The first method uses English language OpinionFinder subjectivity lexicon. The second method uses a small seed word list of Hindi language and expands it to generate subjectivity lexicon. Different evaluation strategies are used to validate the lexicon. We achieved 71.4% agreement with human annotators and ~80% accuracy in classification on a parallel data set in English and Hindi. Extensive simulations conducted on the test dataset confirm the validity of the suggested method

    Sentiment Analysis of Assamese Text Reviews: Supervised Machine Learning Approach with Combined n-gram and TF-IDF Feature

    Get PDF
    Sentiment analysis (SA) is a challenging application of natural language processing (NLP) in various Indian languages. However, there is limited research on sentiment categorization in Assamese texts. This paper investigates sentiment categorization on Assamese textual data using a dataset created by translating Bengali resources into Assamese using Google Translator. The study employs multiple supervised ML methods, including Decision Tree, K-nearest neighbour, Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine, combined with n-gram and Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction methods. The experimental results show that Multinomial Naive Bayes and Support Vector Machine have over 80% accuracy in analyzing sentiments in Assamese texts, while the Unigram model performs better than higher-order n-gram models in both datasets. The proposed model is shown to be an effective tool for sentiment classification in domain-independent Assamese text data

    ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment

    Full text link
    We present a systematic study and comprehensive evaluation of large language models for automatic multilingual readability assessment. In particular, we construct ReadMe++, a multilingual multi-domain dataset with human annotations of 9757 sentences in Arabic, English, French, Hindi, and Russian collected from 112 different data sources. ReadMe++ offers more domain and language diversity than existing readability datasets, making it ideal for benchmarking multilingual and non-English language models (including mBERT, XLM-R, mT5, Llama-2, GPT-4, etc.) in the supervised, unsupervised, and few-shot prompting settings. Our experiments reveal that models fine-tuned on ReadMe++ outperform those trained on single-domain datasets, showcasing superior performance on multi-domain readability assessment and cross-lingual transfer capabilities. We also compare to traditional readability metrics (such as Flesch-Kincaid Grade Level and Open Source Metric for Measuring Arabic Narratives), as well as the state-of-the-art unsupervised metric RSRS (Martinc et al., 2021). We will make our data and code publicly available at: https://github.com/tareknaous/readme.Comment: We have added French and Russian as two new languages to the corpu
    • …
    corecore