188 research outputs found

    A prior case study of natural language processing on different domain

    Get PDF
    In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model

    Urdu Speech and Text Based Sentiment Analyzer

    Full text link
    Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.Comment: Sentiment Analysis, Opinion Mining, Urdu language, polarity assessment, lexicon-based metho

    A Deterministic Finite-State Morphological Analyzer for Urdu Nominal System

    Get PDF
    The morphological analyzer is a computational process that combines lemmas with other linguistic features to produce new lexical word forms. This paper investigates the processing of a nominal system in the Urdu language. It focuses on the inflections of noun forms and studies number, gender, person, and case representations, using a Finite State Machine (FSM) to analyze and create all the possible forms of the standardized registers. The application of the analysis using this tool provides and displays all the possible structures and their declensions. This study adds all the necessary features and values to the lexical concatenating nouns according to their patterns. The accuracy score of the output is 92.7, where the actual output depends on the detailed design of the FSM and the specific morphological processes provided to the finite state tools

    A survey on sentiment analysis in Urdu: A resource-poor language

    Get PDF
    © 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
    corecore