2,021 research outputs found
Information Access in a Multilingual World: Transitioning from Research to Real-World Applications
Multilingual Information Access (MLIA) is at a turning point wherein substantial real-world applications are being introduced after fifteen years of research into cross-language information retrieval, question answering, statistical machine translation and named entity recognition. Previous workshops on this topic have focused on research and small- scale applications. The focus of this workshop was on technology transfer from research to applications and on what future research needs to be done which facilitates MLIA in an increasingly connected multilingual world
A survey on sentiment analysis in Urdu: A resource-poor language
© 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis
A Comprehensive Review of Sentiment Analysis on Indian Regional Languages: Techniques, Challenges, and Trends
Sentiment analysis (SA) is the process of understanding emotion within a text. It helps identify the opinion, attitude, and tone of a text categorizing it into positive, negative, or neutral. SA is frequently used today as more and more people get a chance to put out their thoughts due to the advent of social media. Sentiment analysis benefits industries around the globe, like finance, advertising, marketing, travel, hospitality, etc. Although the majority of work done in this field is on global languages like English, in recent years, the importance of SA in local languages has also been widely recognized. This has led to considerable research in the analysis of Indian regional languages. This paper comprehensively reviews SA in the following major Indian Regional languages: Marathi, Hindi, Tamil, Telugu, Malayalam, Bengali, Gujarati, and Urdu. Furthermore, this paper presents techniques, challenges, findings, recent research trends, and future scope for enhancing results accuracy
Text segmentation for analysing different languages
Over the past several years, researchers have applied different methods of text segmentation. Text segmentation is defined as a method of splitting a document into smaller segments, assuming with its own relevant meaning. Those segments can be classified into the tag, word, sentence, topic, phrase and any information unit. Firstly, this study reviews the different types of text segmentation methods used in different types of documentation, and later discusses the various reasons for utilizing it in opinion mining. The main contribution of this study includes a summarisation of research papers from the past 10 years that applied text segmentation as their main approach in text analysing. Results show that word segmentation was successfully and widely used for processing different languages
Urdu Speech and Text Based Sentiment Analyzer
Discovering what other people think has always been a key aspect of our
information-gathering strategy. People can now actively utilize information
technology to seek out and comprehend the ideas of others, thanks to the
increased availability and popularity of opinion-rich resources such as online
review sites and personal blogs. Because of its crucial function in
understanding people's opinions, sentiment analysis (SA) is a crucial task.
Existing research, on the other hand, is primarily focused on the English
language, with just a small amount of study devoted to low-resource languages.
For sentiment analysis, this work presented a new multi-class Urdu dataset
based on user evaluations. The tweeter website was used to get Urdu dataset.
Our proposed dataset includes 10,000 reviews that have been carefully
classified into two categories by human experts: positive, negative. The
primary purpose of this research is to construct a manually annotated dataset
for Urdu sentiment analysis and to establish the baseline result. Five
different lexicon- and rule-based algorithms including Naivebayes, Stanza,
Textblob, Vader, and Flair are employed and the experimental results show that
Flair with an accuracy of 70% outperforms other tested algorithms.Comment: Sentiment Analysis, Opinion Mining, Urdu language, polarity
assessment, lexicon-based metho
- …