655 research outputs found

    A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends

    Get PDF
    Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy

    Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation

    Full text link
    Multilingualism is widespread around the world and code-switching (CSW) is a common practice among different language pairs/tuples across locations and regions. However, there is still not much progress in building successful CSW systems, despite the recent advances in Massive Multilingual Language Models (MMLMs). We investigate the reasons behind this setback through a critical study about the existing CSW data sets (68) across language pairs in terms of the collection and preparation (e.g. transcription and annotation) stages. This in-depth analysis reveals that \textbf{a)} most CSW data involves English ignoring other language pairs/tuples \textbf{b)} there are flaws in terms of representativeness in data collection and preparation stages due to ignoring the location based, socio-demographic and register variation in CSW. In addition, lack of clarity on the data selection and filtering stages shadow the representativeness of CSW data sets. We conclude by providing a short check-list to improve the representativeness for forthcoming studies involving CSW data collection and preparation.Comment: Accepted for EMNLP'23 Findings (to appear on EMNLP'23 Proceedings

    Sentiment analysis on film review in Gujarati language using machine learning

    Get PDF
    Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness (positive) or study (negative). This paper offers a correlation between the outcomes accomplished by applying the calculation arrangement using various classifiers for instance K-nearest neighbor and multinomial naive Bayes. These techniques are utilized to assess a significant assessment with either a positive remark or negative remark. The gathered information considered on the grounds of the extremity film datasets and an association with the results accessible proof has been created for a careful assessment. This paper investigates the word level count vectorizer and term frequency inverse document frequency (TF-IDF) influence on film sentiment analysis. We concluded that multinomial Naive Bayes (MNB) classier generate more accurate result using TF-IDF vectorizer compared to CountVectorizer, K-nearest-neighbors (KNN) classifier has the same accuracy result in case of TF-IDF and CountVectorizer

    Code Mixed Cross Script Factoid Question Classification - A Deep Learning Approach

    Full text link
    [EN] Before the advent of the Internet era, code-mixing was mainly used in the spoken form. However, with the recent popular informal networking platforms such as Facebook, Twitter, Instagram, etc., in social media, code-mixing is being used more and more in written form. User-generated social media content is becoming an increasingly important resource in applied linguistics. Recent trends in social media usage have led to a proliferation of studies on social media content. Multilingual social media users often write native language content in non-native script (cross-script). Recently Banerjee et al. [9] introduced the code-mixed cross-script question answering research problem and reported that the ever increasing social media content could serve as a potential digital resource for less-computerized languages to build question answering systems. Question classification is a core task in question answering in which questions are assigned a class or a number of classes which denote the expected answer type(s). In this research work, we address the question classification task as part of the code-mixed cross-script question answering research problem. We combine deep learning framework with feature engineering to address the question classification task and enhance the state-of-the-art question classification accuracy by over 4% for code-mixed cross-script questions.The work of the third author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project.Banerjee, S.; Kumar Naskar, S.; Rosso, P.; Bandyopadhyay, S. (2018). Code Mixed Cross Script Factoid Question Classification - A Deep Learning Approach. Journal of Intelligent & Fuzzy Systems. 34(5):2959-2969. https://doi.org/10.3233/JIFS-169481S2959296934
    corecore