87 research outputs found

    A Pattern-Growth Sentence Compression Technique For Malay Text Summarizer

    Get PDF
    Automatic Text Summarization (ATS) has benefited users in terms of identifying and extracting the most salient information from a given text with less effort. The application of Sentence Compression (SC) in ATS is to remove unimportant constituents from a summary sentence while preserving the salient ones by keeping the sentence’s grammar intact. Most previous SC techniques have a high dependency on syntactic rules and knowledge applied to individual word or phrase to cater the removal decision. Despite the ability to produce a new grammatical compressed sentence, prior approaches still suffer several drawbacks including the failure to include some significant and relevant sentences in constructing the final summary sentence

    Assessment, Implication, and Analysis of Online Consumer Reviews: A Literature Review

    Get PDF
    The onset of e-marketplace, virtual communities and social networking has appreciated the influential capability of online consumer reviews (OCR) and therefore necessitate conglomeration of the body of knowledge. This article attempts to conceptually cluster academic literature in both management and technical domain. The study follows a framework which broadly clusters management research under two heads: OCR Assessment and OCR Implication (business implication). Parallel technical literature has been reviewed to reconcile methodologies adopted in the analysis of text content on the web, majorly reviews. Text mining through automated tools, algorithmic contribution (dominant majorly in technical stream literature) and manual assessment (derived from the stream of content analysis) has been studied in this review article. Literature survey of both the domains is analyzed to propose possible area for further research. Usage of text analysis methods along with statistical and data mining techniques to analyze review text and utilize the knowledge creation for solving managerial issues can possibly constitute further work. Available at: https://aisel.aisnet.org/pajais/vol9/iss2/4

    Beyond Extractive: Advancing Abstractive Automatic Text Summarization in Norwegian with Transformers

    Get PDF
    Automatic summarization is a key area in natural language processing (NLP) and machine learning which attempts to generate informative summaries of articles and documents. Despite its evolution since the 1950s, research on automatically summarising Norwegian text has remained relatively underdeveloped. Though there have been some strides made in extractive systems, which generate summaries by selecting and condensing key phrases directly from the source material, the field of abstractive summarization remains unexplored for the Norwegian language. Abstractive summarization is distinct as it generates summaries incorporating new words and phrases not present in the original text. This Master's thesis revolves around one key question: Is it possible to create a machine learning system capable of performing abstractive summarization in Norwegian? To answer this question, we generate and release the first two Norwegian datasets for creating and evaluating Norwegian summarization models. One of these datasets is a web scrape of Store Norske Leksikon (SNL), and the other is a machine-translated version of CNN/Daily Mail. Using these datasets, we fine-tune two Norwegian T5 language models with 580M and 1.2B parameters to create summaries. To assess the quality of the models, we employed both automatic ROUGE scores and human evaluations on the generated summaries. In an effort to better understand the model's behaviour, we measure how a model generates summaries with various metrics, including our own novel contribution which we name "Match Ratio" which measures sentence similarities between summaries and articles based on Levenshtein distances. The top-performing models achieved ROUGE-1 scores of 35.07 and 34.02 on SNL and CNN/DM, respectively. In terms of human evaluation, the best model yielded an average score of 3.96/5.00 for SNL and 4.64/5.00 for CNN/Daily Mail across various criteria. Based on these results, we conclude that it is possible to perform abstractive summarization of Norwegian with high-quality summaries. With this research, we have laid a foundation that hopefully will facilitate future research, empowering others to build upon our findings and contribute further to the development of Norwegian summarization models

    Hierarchical organization of consumer reviews for products and its applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    QMOS: Query-based multi-documents opinion-oriented summarization

    Get PDF
    Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users’ needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods

    Medical text simplification: bridging the gap between medical research and public understanding

    Get PDF
    Text Simplification is a subdomain of Natural Language Processing that focuses on applying computational techniques to modify the content and structure of the text to make it interpretable while retaining the main idea. The advancements in text simplification research have provided valuable benefits to a wide range of readers, including those with learning disabilities and non-native speakers. Moreover, even regular readers who are not experts in fields such as medicine or finance have found text simplification techniques to be useful in accessing scientific literature and research. This thesis aims to create a text simplification approach that can effectively simplify complex biomedical literature. Chapter 2 provides an insightful overview of the datasets, methods, and evaluation techniques used in text simplification. Chapter 3 conducts an extensive bibliometric analysis of literature in the field of text simplification to understand research trends, find important research and application topics of text simplification research, and understand shortcomings in the field. Based on the findings in Chapter 3, we found that the advancements in text simplification research can have a positive impact on the medical domain. The research in the field of medicine is constantly developing and contains important information about drugs and treatments for various life threatening diseases. Although this information is accessible to the public, it is very complex in nature, thus making it difficult to understand
    corecore