Search CORE

14 research outputs found

Copy-cat Bot for Narendra Modi which generates plausible new speeches in Modhi’s style using machine learning approaches

Author: DDA Gamini
Roshani Abeysekera
Publication venue: 'University of Sri Jayewardenepura'
Publication date: 31/12/2022
Field of study

Many consequences in the human past can be traced back to that one well-written, well-presentedspeech. Speeches grasp the power to move nations or touch hearts as long as they are well-thought-out.This is why gaining the expertise of speech giving and speech writing is something we should all intent togain. A copy-cat bot is a model that can learn the writing and talking style of a certain person and replicateit. The main objective of this research study is to apply simple Recurrent Neural Network (RNN), LongShort-Term Memory (LSTM) Recurrent Neural Networks and Gated Recurrent Unit (GRU) in developinga speech generation system that deep learns one text and then generates new text. This research looks intothe generation of English transcripts of Narendra Modi’s speeches. The generated text using LSTM andGRU models has great potential. The output resulted by RNN is less realistic and pragmatic, but itsvariants LSTM and GRU performed better. Though the grammatical correctness and the sentencetransitions were absent in generated text of LSTM and GRU, but their output is somewhat logical ascompared to RNN. LSTM and GRU performed better as it generated more realistic text and training lossis small, perplexity is small and mean probability is high compared to RNN

University of Sri Jayewardenepura: Journals & Proceedings

Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

Author: Ahmed Amr
Zaheer Manzil
Zhang Ruiyi
Zhang Xinyuan
Publication venue
Publication date: 14/09/2020
Field of study

High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task. In this work, we propose the first unsupervised abstractive dialogue summarization model for tete-a-tetes (SuTaT). Unlike standard text summarization, a dialogue summarization method should consider the multi-speaker scenario where the speakers have different roles, goals, and language styles. In a tete-a-tete, such as a customer-agent conversation, SuTaT aims to summarize for each speaker by modeling the customer utterances and the agent utterances separately while retaining their correlations. SuTaT consists of a conditional generative module and two unsupervised summarization modules. The conditional generative module contains two encoders and two decoders in a variational autoencoder framework where the dependencies between two latent spaces are captured. With the same encoders and decoders, two unsupervised summarization modules equipped with sentence-level self-attention mechanisms generate summaries without using any annotations. Experimental results show that SuTaT is superior on unsupervised dialogue summarization for both automatic and human evaluations, and is capable of dialogue classification and single-turn conversation generation

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Automatically Evaluating Opinion Prevalence in Opinion Summarization

Author: Malon Christopher
Publication venue
Publication date: 26/07/2023
Field of study

When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. We propose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number of reviews that are consistent with each statement in the summary, while discrediting trivial or redundant statements. To formulate this opinion prevalence metric, we consider several existing methods to score the factual consistency of a summary statement with respect to each individual source review. On a corpus of Amazon product reviews, we gather multiple human judgments of the opinion consistency, to determine which automatic metric best expresses consistency in product reviews. Using the resulting opinion prevalence metric, we show that a human authored summary has only slightly better opinion prevalence than randomly selected extracts from the source reviews, and previous extractive and abstractive unsupervised opinion summarization methods perform worse than humans. We demonstrate room for improvement with a greedy construction of extractive summaries with twice the opinion prevalence achieved by humans. Finally, we show that preprocessing source reviews by simplification can raise the opinion prevalence achieved by existing abstractive opinion summarization systems to the level of human performance.Comment: The 6th Workshop on e-Commerce and NLP (KDD 2023

arXiv.org e-Print Archive

MULTI-DOCUMENT SUMMARIZATION USING A COMBINATION OF FEATURES BASED ON CENTROID AND KEYWORD

Author: Fatichah Chastine
Purwitasari Diana
Ranggianto Narandha Arya
Sholikah Rizka Wakhidatus
Publication venue: Department of Informatics, Institut Teknologi Sepuluh Nopember
Publication date: 31/07/2023
Field of study

Summarizing text in multi-documents requires choosing important sentences which are more complex than in one document because there is different information which results in contradictions and redundancy of information. The process of selecting important sentences can be done by scoring sentences that consider the main information. The combination of features is carried out for the process of scoring sentences so that sentences with high scores become candidates for summary. The centroid approach provides an advantage in obtaining key information. However, the centroid approach is still limited to information close to the center point. The addition of positional features provides increased information on the importance of a sentence, but positional features only focus on the main position. Therefore, researchers use the keyword feature as a research contribution that can provide additional information on important words in the form of N-grams in a document. In this study, the centroid, position, and keyword features were combined for a scoring process which can provide increased performance for multi-document news data and reviews. The test results show that the addition of keyword features produces the highest value for news data DUC2004 ROUGE-1 of 35.44, ROUGE-2 of 7.64, ROUGE-L of 37.02, and BERTScore of 84.22. While the Amazon review data was obtained with ROUGE-1 of 32.24, ROUGE-2 of 6.14, ROUGE-L of 34.77, and BERTScore of 85.75. The ROUGE and BERScore values outperform the other unsupervised models

JUTI: Jurnal Ilmiah Teknologi Informasi

Abstractive Opinion Tagging

Author: Chen Zhumin
de Rijke Maarten
Li Piji
Li Qintong
Li Xinyi
Ren Zhaochun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

In e-commerce, opinion tags refer to a ranked list of tags provided by the e-commerce platform that reflect characteristics of reviews of an item. To assist consumers to quickly grasp a large number of reviews about an item, opinion tags are increasingly being applied by e-commerce platforms. Current mechanisms for generating opinion tags rely on either manual labelling or heuristic methods, which is time-consuming and ineffective. In this paper, we propose the abstractive opinion tagging task, where systems have to automatically generate a ranked list of opinion tags that are based on, but need not occur in, a given set of user-generated reviews. The abstractive opinion tagging task comes with three main challenges: (1) the noisy nature of reviews; (2) the formal nature of opinion tags vs. the colloquial language usage in reviews; and (3) the need to distinguish between different items with very similar aspects. To address these challenges, we propose an abstractive opinion tagging framework, named AOT-Net, to generate a ranked list of opinion tags given a large number of reviews. First, a sentence-level salience estimation component estimates each review's salience score. Next, a review clustering and ranking component ranks reviews in two steps: first, reviews are grouped into clusters and ranked by cluster size; then, reviews within each cluster are ranked by their distance to the cluster center. Finally, given the ranked reviews, a rank-aware opinion tagging component incorporates an alignment feature and alignment loss to generate a ranked list of opinion tags. To facilitate the study of this task, we create and release a large-scale dataset, called eComTag, crawled from real-world e-commerce websites. Extensive experiments conducted on the eComTag dataset verify the effectiveness of the proposed AOT-Net in terms of various evaluation metrics.Comment: Accepted by WSDM 202

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications