5,055 research outputs found
Feature Selection for Summarising: The Sunderland DUC 2004 Experience
In this paper we describe our participation in task 1-very short single-document summaries in DUC 2004. The task chosen is related to our research project, which aims to produce abstracting summaries to improve search engine result summaries. DUC allowed us to produce summaries no longer than 75 characters, therefore we focused on feature selection to produce a set of key words as summaries instead of complete sentences. Three descriptions of our summarisers are given. Each of the summarisers performs very differently in the six ROUGE metrics. One of our summarisers which uses a simple algorithm to produce summaries without any supervised learning or complicated NLP technique performs surprisingly well among different ROUGE evaluations. Finally we give an analysis of ROUGE and participants’ results. ROUGE is an automatic evaluation of summaries package, which uses n-gram matching to calculate the overlapping between machine and human summaries, and indeed saves time for human evaluation. However, the different ROUGE metrics give different results and it is hard to judge which is the best for automatic summaries evaluation. Also it does not include complete sentences evaluation. Therefore we suggest some work needs to be done on ROUGE in the future to make it really effective
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture
We present the architecture behind Twitter's real-time related query
suggestion and spelling correction service. Although these tasks have received
much attention in the web search literature, the Twitter context introduces a
real-time "twist": after significant breaking news events, we aim to provide
relevant results within minutes. This paper provides a case study illustrating
the challenges of real-time data processing in the era of "big data". We tell
the story of how our system was built twice: our first implementation was built
on a typical Hadoop-based analytics stack, but was later replaced because it
did not meet the latency requirements necessary to generate meaningful
real-time results. The second implementation, which is the system deployed in
production, is a custom in-memory processing engine specifically designed for
the task. This experience taught us that the current typical usage of Hadoop as
a "big data" platform, while great for experimentation, is not well suited to
low-latency processing, and points the way to future work on data analytics
platforms that can handle "big" as well as "fast" data
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
- …