14 research outputs found
Copy-cat Bot for Narendra Modi which generates plausible new speeches in Modhi’s style using machine learning approaches
Many consequences in the human past can be traced back to that one well-written, well-presentedspeech. Speeches grasp the power to move nations or touch hearts as long as they are well-thought-out.This is why gaining the expertise of speech giving and speech writing is something we should all intent togain. A copy-cat bot is a model that can learn the writing and talking style of a certain person and replicateit. The main objective of this research study is to apply simple Recurrent Neural Network (RNN), LongShort-Term Memory (LSTM) Recurrent Neural Networks and Gated Recurrent Unit (GRU) in developinga speech generation system that deep learns one text and then generates new text. This research looks intothe generation of English transcripts of Narendra Modi’s speeches. The generated text using LSTM andGRU models has great potential. The output resulted by RNN is less realistic and pragmatic, but itsvariants LSTM and GRU performed better. Though the grammatical correctness and the sentencetransitions were absent in generated text of LSTM and GRU, but their output is somewhat logical ascompared to RNN. LSTM and GRU performed better as it generated more realistic text and training lossis small, perplexity is small and mean probability is high compared to RNN
Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes
High-quality dialogue-summary paired data is expensive to produce and
domain-sensitive, making abstractive dialogue summarization a challenging task.
In this work, we propose the first unsupervised abstractive dialogue
summarization model for tete-a-tetes (SuTaT). Unlike standard text
summarization, a dialogue summarization method should consider the
multi-speaker scenario where the speakers have different roles, goals, and
language styles. In a tete-a-tete, such as a customer-agent conversation, SuTaT
aims to summarize for each speaker by modeling the customer utterances and the
agent utterances separately while retaining their correlations. SuTaT consists
of a conditional generative module and two unsupervised summarization modules.
The conditional generative module contains two encoders and two decoders in a
variational autoencoder framework where the dependencies between two latent
spaces are captured. With the same encoders and decoders, two unsupervised
summarization modules equipped with sentence-level self-attention mechanisms
generate summaries without using any annotations. Experimental results show
that SuTaT is superior on unsupervised dialogue summarization for both
automatic and human evaluations, and is capable of dialogue classification and
single-turn conversation generation
Automatically Evaluating Opinion Prevalence in Opinion Summarization
When faced with a large number of product reviews, it is not clear that a
human can remember all of them and weight opinions representatively to write a
good reference summary. We propose an automatic metric to test the prevalence
of the opinions that a summary expresses, based on counting the number of
reviews that are consistent with each statement in the summary, while
discrediting trivial or redundant statements. To formulate this opinion
prevalence metric, we consider several existing methods to score the factual
consistency of a summary statement with respect to each individual source
review. On a corpus of Amazon product reviews, we gather multiple human
judgments of the opinion consistency, to determine which automatic metric best
expresses consistency in product reviews. Using the resulting opinion
prevalence metric, we show that a human authored summary has only slightly
better opinion prevalence than randomly selected extracts from the source
reviews, and previous extractive and abstractive unsupervised opinion
summarization methods perform worse than humans. We demonstrate room for
improvement with a greedy construction of extractive summaries with twice the
opinion prevalence achieved by humans. Finally, we show that preprocessing
source reviews by simplification can raise the opinion prevalence achieved by
existing abstractive opinion summarization systems to the level of human
performance.Comment: The 6th Workshop on e-Commerce and NLP (KDD 2023
MULTI-DOCUMENT SUMMARIZATION USING A COMBINATION OF FEATURES BASED ON CENTROID AND KEYWORD
Summarizing text in multi-documents requires choosing important sentences which are more complex than in one document because there is different information which results in contradictions and redundancy of information. The process of selecting important sentences can be done by scoring sentences that consider the main information. The combination of features is carried out for the process of scoring sentences so that sentences with high scores become candidates for summary. The centroid approach provides an advantage in obtaining key information. However, the centroid approach is still limited to information close to the center point. The addition of positional features provides increased information on the importance of a sentence, but positional features only focus on the main position. Therefore, researchers use the keyword feature as a research contribution that can provide additional information on important words in the form of N-grams in a document. In this study, the centroid, position, and keyword features were combined for a scoring process which can provide increased performance for multi-document news data and reviews. The test results show that the addition of keyword features produces the highest value for news data DUC2004 ROUGE-1 of 35.44, ROUGE-2 of 7.64, ROUGE-L of 37.02, and BERTScore of 84.22. While the Amazon review data was obtained with ROUGE-1 of 32.24, ROUGE-2 of 6.14, ROUGE-L of 34.77, and BERTScore of 85.75. The ROUGE and BERScore values outperform the other unsupervised models
Abstractive Opinion Tagging
In e-commerce, opinion tags refer to a ranked list of tags provided by the
e-commerce platform that reflect characteristics of reviews of an item. To
assist consumers to quickly grasp a large number of reviews about an item,
opinion tags are increasingly being applied by e-commerce platforms. Current
mechanisms for generating opinion tags rely on either manual labelling or
heuristic methods, which is time-consuming and ineffective. In this paper, we
propose the abstractive opinion tagging task, where systems have to
automatically generate a ranked list of opinion tags that are based on, but
need not occur in, a given set of user-generated reviews.
The abstractive opinion tagging task comes with three main challenges: (1)
the noisy nature of reviews; (2) the formal nature of opinion tags vs. the
colloquial language usage in reviews; and (3) the need to distinguish between
different items with very similar aspects. To address these challenges, we
propose an abstractive opinion tagging framework, named AOT-Net, to generate a
ranked list of opinion tags given a large number of reviews. First, a
sentence-level salience estimation component estimates each review's salience
score. Next, a review clustering and ranking component ranks reviews in two
steps: first, reviews are grouped into clusters and ranked by cluster size;
then, reviews within each cluster are ranked by their distance to the cluster
center. Finally, given the ranked reviews, a rank-aware opinion tagging
component incorporates an alignment feature and alignment loss to generate a
ranked list of opinion tags. To facilitate the study of this task, we create
and release a large-scale dataset, called eComTag, crawled from real-world
e-commerce websites. Extensive experiments conducted on the eComTag dataset
verify the effectiveness of the proposed AOT-Net in terms of various evaluation
metrics.Comment: Accepted by WSDM 202