1,425 research outputs found
MemSum: Extractive Summarization of Long Documents using Multi-step Episodic Markov Decision Processes
We introduce MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer), a reinforcement-learning-based extractive summarizer enriched at any given time step with information on the current extraction history. Similar to previous models in this vein, MemSum iteratively selects sentences into the summary. Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task: 1) the text content of the sentence, 2) the global text context of the rest of the document, and 3) the extraction history consisting of the set of sentences that have already been extracted. With a lightweight architecture, MemSum nonetheless obtains state-of-the-art test-set performance (ROUGE score) on long document datasets (PubMed, arXiv, and GovReport). Supporting analysis demonstrates that the added awareness of extraction history gives MemSum robustness against redundancy in the source document
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
Accurate user directed summarization from existing tools
This paper describes a set of experimental
results produced from the TIPSTER
SUMMAC initiative on user directed
summaries: document summaries generated in
the context of an information need expressed
as a query. The summarizer that was
evaluated was based on a set of existing
statistical techniques that had been applied
successfully to the INQUERY retrieval system.
The techniques proved to have a wider utility,
however, as the summarizer was one of the
better performing systems in the SUMMAC
evaluation. The design of this summarizer is
presented with a range of evaluations: both
those provided by SUMMAC as well as a set of
preliminary, more informal, evaluations that
examined additional aspects of the summaries.
Amongst other conclusions, the results reveal
that users can judge the relevance of
documents from their summary almost as
accurately as if they had had access to the
document’s full text
Identifying Privacy Policy in Service Terms Using Natural Language Processing
Ever since technology (tech) companies realized that people\u27s usage data from their activities on mobile applications to the internet could be sold to advertisers for a profit, it began the Big Data era where tech companies collect as much data as possible from users. One of the benefits of this new era is the creation of new types of jobs such as data scientists, Big Data engineers, etc. However, this new era has also raised one of the hottest topics, which is data privacy. A myriad number of complaints have been raised on data privacy, such as how much access most mobile applications require to function correctly, from having access to a user\u27s contact list to media files. Furthermore, the level of tracking has reached new heights, from tracking mobile phone location, activities on search engines, to phone battery life percentage. However much data is collected, it is within the tech companies\u27 right to collect the data because they provide a privacy policy that informs the user on the type of data they collect, how they use that data, and how they share that data. In addition, we find that all privacy policies used in this research state that by using their mobile application, the user agrees to their terms and conditions. Most alarmingly, research done on privacy policies has found that only 9% of mobile app users read legal terms and conditions [2] because they are too long, which is a worryingly low number. Therefore, in this thesis, we present two summarization programs that take in privacy policy text as input and produce a shorter summarized version of the privacy policy. The results from the two summarization programs show that both implementations achieve an average of at least 50%, 90%, and 85% on the same sentence, clear sentence, and summary score grading metrics, respectively
- …