201 research outputs found

    A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends

    Get PDF
    Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy

    PCROD: Context Aware Role based Offensive Detection using NLP/ DL Approaches

    Get PDF
    With the increased use of social media many people misuse online platforms by uploading offensive content and sharing the same with vast audience. Here comes controlling of such offensive contents. In this work we concentrate on the issue of finding offensive text in social media. Existing offensive text detection systems treat weak pejoratives like ‘idiot‘ and extremely indecent pejoratives like ‘f***‘ as same as offensive irrespective of formal and informal contexts . In fact the weakly pejoratives in informal discussions among friends are casual and common which are not offensive but the same can be offensive when expressed in formal discussions. Crucial challenges to accomplish the task of role based offensive detection in text are i) considering the roles while classifying the text as offensive or not i) creating a contextual datasets including both formal and informal roles. To tackle the above mentioned challenges we develop deep neural network based model known as context aware role based offensive detection(CROD). We examine CROD on the manually created dataset that is collected from social networking sites. Results show that CROD gives better performance with RoBERTa with an accuracy of 94% while considering the context and role in data specifics

    Enhancing Hate Speech Detection in Sinhala Language on Social Media using Machine Learning

    Get PDF
    To counter the harmful dissemination of hate speech on social media, especially abusive outbursts of racism and sexism, automatic and accurate detection is crucial. However, a significant challenge lies in the vast sparsity of available data, hindering accurate classification. This study presents a novel approach to Sinhala hate speech detection on social platforms by coupling a global feature selection process with traditional machine learning, the research scrutinizes hate speech intricacies. A class-based variable feature selection process evaluates significance via global and local scores, identifying optimal values for prevalent classifiers. Utilizing class-based and corpus-based evaluations, we pinpoint optimal feature values for classifiers like SVM, MNB, and RF. Our results reveal notable enhancements in performance, specifically the F1-Score, underscoring how feature selection and parameter tuning work in tandem to boost model efficacy. Furthermore, the study explores nuanced variations in classifier performance across training and testing datasets, emphasizing the importance of model generalization

    A hybrid dependency-based approach for Urdu sentiment analysis

    Get PDF
    In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to extract valuable insights, improve their services, and effectively reach a broader audience based on users’ expressed opinions on social media platforms. To harness the potential of this extensive and unstructured data, a deep understanding of Natural Language Processing (NLP) is crucial. Existing approaches for sentiment analysis (SA) often rely on word co-occurrence frequencies, which prove inefficient in practical scenarios. Identifying this research gap, this paper presents a framework for concept-level sentiment analysis, aiming to enhance the accuracy of sentiment analysis (SA). A comprehensive Urdu language dataset was constructed by collecting data from YouTube, consisting of various talks and reviews on topics such as movies, politics, and commercial products. The dataset was further enriched by incorporating language rules and Deep Neural Networks (DNN) to optimize polarity detection. For sentiment analysis, the proposed framework employs predefined rules to trigger sentiment flow from words to concepts, leveraging the dependency relations among different words in a sentence based on Urdu language grammatical rules. In cases where predefined patterns are not triggered, the framework seamlessly switches to its sub-symbolic counterpart, passing the data to the DNN for sentence classification. Experimental results demonstrate that the proposed framework surpasses state-of-the-art approaches, including LSTM, CNN, SVM, LR, and MLP, achieving an improvement of 6–7% on Urdu dataset. In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. By combining language rules and DNN, the proposed framework demonstrates superior performance compared to existing methodologies, showcasing its effectiveness in accurately analyzing sentiment in Urdu text data

    Forms and functions of codeswitching to Hindi/Urdu in Indian English and Pakistani English

    Get PDF
    This study examines the forms and functions of codeswitching to Hindi/Urdu in Indian English and Pakistani English. The aim is to find out how these local languages manifest and are used in the local non-native variety of English. Previous research on codeswitching to Hindi and Urdu in Indian English and Pakistani English has mostly concentrated only on the use of single lexical items and their impact on the lexicon of the local English leaving the more varied ways in which the local languages show relatively unresearched. The material was collected from the Corpus of Global Web-based English (GloWbE). This study served additionally as a methodological experiment using the most frequent Hindi and Urdu words to locate and collect codeswitches in the corpus. The analysis of the structural patterns showed that Hindi and Urdu codeswitches manifest in a variety of different forms ranging from longer intersentential codeswitches for complete Hindi and Urdu sentences to interclausal switches, i.e. switches between main and dependent clauses, and to shorter intraclausal codeswitches like words and phrases. The structural analysis also revealed that the structural patterns appear to follow the same tendencies in both Indian and Pakistani English. The Hindi/Urdu codeswitches also served diverse types of functions. The switches could roughly be divided into switches with a communicative function and cultural switches. Communicative functions included, among others, quotations, figurative language, conveying greetings and prayers, interjections, reiterations, and metalinguistic commentary. Cultural codeswitches expressed objects and concepts specific to the Indian and Pakistani culture. Cultural switches also functioned as references to the Indian and Pakistani culture implying the Indianness or Pakistaniness of something or someone

    Hate Speech and Offensive Language Detection in Bengali

    Full text link
    Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.Comment: Accepted at AACL-IJCNLP 202
    • …
    corecore