806 research outputs found
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
Keyword Detection in Text Summarization
Summarization is the process of reducing a text document in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Extractive summary works on the given text to extract sentences that best convey the message hidden in the text. Most extractive summarization techniques revolve around the concept of indexing keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting important words having a higher frequency than others, with stress on important. However the current techniques to handle this importance include a stop list which might include words that are critically important to the text. In this thesis, I present a work in progress to define an algorithm to extract truly significant keywords which might have lost its significance if subjected to the current keyword extraction algorithms
Keyphrase Generation: A Multi-Aspect Survey
Extractive keyphrase generation research has been around since the nineties,
but the more advanced abstractive approach based on the encoder-decoder
framework and sequence-to-sequence learning has been explored only recently. In
fact, more than a dozen of abstractive methods have been proposed in the last
three years, producing meaningful keyphrases and achieving state-of-the-art
scores. In this survey, we examine various aspects of the extractive keyphrase
generation methods and focus mostly on the more recent abstractive methods that
are based on neural networks. We pay particular attention to the mechanisms
that have driven the perfection of the later. A huge collection of scientific
article metadata and the corresponding keyphrases is created and released for
the research community. We also present various keyphrase generation and text
summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th
Conference of the Open Innovations Association FRUCT, Helsinki, Finlan
Helmholtz Principle-Based Keyword Extraction
In today’s world of evolving technology, everybody wishes to accomplish tasks in least time. As information available online is perpetuating every day, it becomes very difficult to summarize any more than 100 documents in acceptable time. Thus, ”text summarization” is a challenging problem in the area of Natural Language Processing (NLP) especially in the context of global languages. In this thesis, we survey taxonomy of text summarization from different aspects. It briefly explains different approaches to summarization and the evaluation parameters. Also presented are a thorough details and facts about more than fifty automatic text summarization systems to ease the job of researchers and serve as a short encyclopedia for the investigated systems. Keyword extraction methods plays vital role in text mining and document processing. Keywords represent essential content of a document. Text mining applications take the advantage of keywords for processing documents. A quality Keyword is a word that represents the exact content of the text subsetly. It is very difficult to process large number of documents to get high quality keywords in acceptable time. This thesis gives a comparison between the most popular keyword extractions method, tf-idf and the proposed method that is based on Helmholtz Principle. Helmholtz Principle is based on the ideas from image processing and derived from the Gestalt theory of human perception. We also investigate the run time to extract the keywords by both the methods. Experimental results show that keyword extraction method based on Helmholtz Principle outperformancetf-idf
Keyword Detection in Text Summarization
Summarization is the process of reducing a text document in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Extractive summary works on the given text to extract sentences that best convey the message hidden in the text. Most extractive summarization techniques revolve around the concept of indexing keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting important words having a higher frequency than others, with stress on important. However the current techniques to handle this importance include a stop list which might include words that are critically important to the text. In this thesis, I present a work in progress to define an algorithm to extract truly significant keywords which might have lost its significance if subjected to the current keyword extraction algorithms
Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)
This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory
2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP
2015) at the China National Convention Center in Beijing, on July 31st 2015.
Narratives are at the heart of information sharing. Ever since people began to share their experiences,
they have connected them to form narratives. The study od storytelling and the field of literary theory
called narratology have developed complex frameworks and models related to various aspects of
narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point
of view, narrative voice, narrative goals, and many others. These notions from narratology have been
applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g.
Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an
autonomous field of study and research. Narrative has been the focus of a number of workshops and
conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of
Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML
by Mani (2013)).
The workshop aimed at bringing together researchers from different communities working on
representing and extracting narrative structures in news, a text genre which is highly used in NLP
but which has received little attention with respect to narrative structure, representation and analysis.
Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic
extraction of events from single documents and work towards extracting story structures from multiple
documents, while these documents are published over time as news streams. Policy makers, NGOs,
information specialists (such as journalists and librarians) and others are increasingly in need of tools
that support them in finding salient stories in large amounts of information to more effectively implement
policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around
reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g.
hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections
of relevant information but also projections to the future. They form a valuable potential for exploiting
news data in an innovative way.JRC.G.2-Global security and crisis managemen
Semantic Annotation and Search: Bridging the Gap between Text, Knowledge and Language
In recent years, the ever-increasing quantities of entities in large knowledge bases on the Web, such as DBpedia, Freebase and YAGO, pose new challenges but at the same time open up new opportunities for intelligent information access. These knowledge bases (KBs) have become valuable resources in many research areas, such as natural language processing (NLP) and information retrieval (IR). Recently, almost every major commercial Web search engine has incorporated entities into their search process, including Google’s Knowledge Graph, Yahoo!’s Web of Objects and Microsoft’s Satori Graph/Bing Snapshots. The goal is to bridge the semantic gap between natural language text and formalized knowledge.
Within the context of globalization, multilingual and cross-lingual access to information has emerged as an issue of major interest. Nowadays, more and more people from different countries are connecting to the Internet, in particular the Web, and many users can understand more than one language. While the diversity of languages on the Web has been growing, for most people there is still very little content in their native language. As a consequence of the ability to understand more than one language, users are also interested in Web content in other languages than their mother tongue. There is an impending need for technologies that can help in overcoming the language barrier for multilingual and cross-lingual information access. In this thesis, we face the overall research question of how to allow for semantic-aware and cross-lingual processing of Web documents and user queries by leveraging knowledge bases.
With the goal of addressing this complex problem, we provide the following solutions: (1) semantic annotation for addressing the semantic gap between Web documents and knowledge; (2) semantic search for coping with the semantic gap between keyword queries and knowledge; (3) the exploitation of cross-lingual semantics for overcoming the language barrier between natural language expressions (i.e., keyword queries and Web documents) and knowledge for enabling cross-lingual semantic annotation and search. We evaluated these solutions and the results showed advances beyond the state-of-the-art. In addition, we implemented a framework of cross-lingual semantic annotation and search, which has been widely used for cross-lingual processing of media content in the context of our research projects
- …