406 research outputs found

    Text Summarization Technique for Punjabi Language Using Neural Networks

    Get PDF
    In the contemporary world, utilization of digital content has risen exponentially. For example, newspaper and web articles, status updates, advertisements etc. have become an integral part of our daily routine. Thus, there is a need to build an automated system to summarize such large documents of text in order to save time and effort. Although, there are summarizers for languages such as English since the work has started in the 1950s and at present has led it up to a matured stage but there are several languages that still need special attention such as Punjabi language. The Punjabi language is highly rich in morphological structure as compared to English and other foreign languages. In this work, we provide three phase extractive summarization methodology using neural networks. It induces compendious summary of Punjabi single text document. The methodology incorporates pre-processing phase that cleans the text; processing phase that extracts statistical and linguistic features; and classification phase. The classification based neural network applies an activation function- sigmoid and weighted error reduction-gradient descent optimization to generate the resultant output summary. The proposed summarization system is applied over monolingual Punjabi text corpus from Indian languages corpora initiative phase-II. The precision, recall and F-measure are achieved as 90.0%, 89.28% an 89.65% respectively which is reasonably good in comparison to the performance of other existing Indian languages" summarizers.This research is partially funded by the Ministry of Economy, Industry and Competitiveness, Spain (CSO2017-86747-R)

    Pre-processing of Domain Ontology Graph Generation System in Punjabi

    Full text link
    This paper describes pre-processing phase of ontology graph generation system from Punjabi text documents of different domains. This research paper focuses on pre-processing of Punjabi text documents. Pre-processing is structured representation of the input text. Pre-processing of ontology graph generation includes allowing input restrictions to the text, removal of special symbols and punctuation marks, removal of duplicate terms, removal of stop words, extract terms by matching input terms with dictionary and gazetteer lists terms.Comment: 6 pages, 17 figures, 1 table, "Published with International Journal of Engineering Trends and Technology (IJETT)

    Opinion Mining Summarization and Automation Process: A Survey

    Get PDF
    In this modern age, the internet is a powerful source of information. Roughly, one-third of the world population spends a significant amount of their time and money on surfing the internet. In every field of life, people are gaining vast information from it such as learning, amusement, communication, shopping, etc. For this purpose, users tend to exploit websites and provide their remarks or views on any product, service, event, etc. based on their experience that might be useful for other users. In this manner, a huge amount of feedback in the form of textual data is composed of those webs, and this data can be explored, evaluated and controlled for the decision-making process. Opinion Mining (OM) is a type of Natural Language Processing (NLP) and extraction of the theme or idea from the user's opinions in the form of positive, negative and neutral comments. Therefore, researchers try to present information in the form of a summary that would be useful for different users. Hence, the research community has generated automatic summaries from the 1950s until now, and these automation processes are divided into two categories, which is abstractive and extractive methods. This paper presents an overview of the useful methods in OM and explains the idea about OM regarding summarization and its automation process

    A Survey of Biological Entity Recognition Approaches

    Get PDF
    There has been growing interest in the task of Named Entity Recognition (NER) and a lot of research has been done in this direction in last two decades. Particularly, a lot of progress has been made in the biomedical domain with emphasis on identifying domain-specific entities and often the task being known as Biological Named Entity Recognition (BER). The task of biological entity recognition (BER) has been proved to be a challenging task due to several reasons as identified by many researchers. The recognition of biological entities in text and the extraction of relationships between them have paved the way for doing more complex text-mining tasks and building further applications. This paper looks at the challenges perceived by the researchers in BER task and investigates the works done in the domain of BER by using the multiple approaches available for the task

    A survey on sentiment analysis in Urdu: A resource-poor language

    Get PDF
    © 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis

    Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm

    Get PDF
    In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%

    A machine learning approach for Urdu text sentiment analysis

    Get PDF
    Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work
    • …
    corecore