742 research outputs found
Extractive text summarisation using graph triangle counting approach: proposed method
Currently, with a growing quantity of automated text data, the necessity for the con-struction of Summarisation systems turns out to be vital. Summarisation systems confine and condense the mainly vital ideas of the papers and assist the user to find and understand the foremost facts of the text quicker and easier from the dispensation of information. Compelling set of such systems are those that create summaries of ex-tracts. This type of summary, which is called Extractive Summarisation , is created by choosing large significant fragments of the text without making any amendment to the original. One methodology for generating this type of summary is consuming the graph theory. In graph theory there is one field called graph pruning / reduction, which means, to find the best representation of the main graph with a smaller number of nodes and edges. In this paper, a graph reduction technique called the triangle counting approach is presented to choose the most vital sentences of the text. The first phase is to represent a text as a graph, where nodes are the sentences and edges are the similarity between the sentences. The second phase is to construct the triangles, after that bit vector representation and the final phase is to retrieve the sentences based on the values of bit vector
Towards Personalized and Human-in-the-Loop Document Summarization
The ubiquitous availability of computing devices and the widespread use of
the internet have generated a large amount of data continuously. Therefore, the
amount of available information on any given topic is far beyond humans'
processing capacity to properly process, causing what is known as information
overload. To efficiently cope with large amounts of information and generate
content with significant value to users, we require identifying, merging and
summarising information. Data summaries can help gather related information and
collect it into a shorter format that enables answering complicated questions,
gaining new insight and discovering conceptual boundaries.
This thesis focuses on three main challenges to alleviate information
overload using novel summarisation techniques. It further intends to facilitate
the analysis of documents to support personalised information extraction. This
thesis separates the research issues into four areas, covering (i) feature
engineering in document summarisation, (ii) traditional static and inflexible
summaries, (iii) traditional generic summarisation approaches, and (iv) the
need for reference summaries. We propose novel approaches to tackle these
challenges, by: i)enabling automatic intelligent feature engineering, ii)
enabling flexible and interactive summarisation, iii) utilising intelligent and
personalised summarisation approaches. The experimental results prove the
efficiency of the proposed approaches compared to other state-of-the-art
models. We further propose solutions to the information overload problem in
different domains through summarisation, covering network traffic data, health
data and business process data.Comment: PhD thesi
POLIS: a probabilistic summarisation logic for structured documents
PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF,
or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements,
rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval
logics have allowed developers to include search facilities into numerous applications, without
the need of having detailed knowledge of retrieval models.
Although automatic document summarisation has been recognised as a useful tool for reducing
the workload of information system users, very few such abstraction layers have been developed
for the task of automatic document summarisation. This thesis describes the development
of an abstraction logic for summarisation, called POLIS, which provides users (such as developers
or knowledge engineers) with a high-level access to summarisation facilities. Furthermore,
POLIS allows users to exploit the hierarchical information provided by structured documents.
The development of POLIS is carried out in a step-by-step way. We start by defining a series
of probabilistic summarisation models, which provide weights to document-elements at a user
selected level. These summarisation models are those accessible through POLIS. The formal
definition of POLIS is performed in three steps. We start by providing a syntax for POLIS,
through which users/knowledge engineers interact with the logic. This is followed by a definition
of the logics semantics. Finally, we provide details of an implementation of POLIS.
The final chapters of this dissertation are concerned with the evaluation of POLIS, which is
conducted in two stages. Firstly, we evaluate the performance of the summarisation models by
applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus.
This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks
A literature survey of methods for analysis of subjective language
Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area
Speech to text conversion and summarization for effective understanding and documentation
Speech, is the most powerful way of communication with which human beings express their thoughts and feelings through different languages. The features of speech differs with each language. However, even while communicating in the same language, the pace and the dialect varies with each person. This creates difficulty in understanding the conveyed message for some people. Sometimes lengthy speeches are also quite difficult to follow due to reasons such as different pronunciation, pace and so on. Speech recognition which is an inter disciplinary field of computational linguistics aids in developing technologies that empowers the recognition and translation of speech into text. Text summarization extracts the utmost important information from a source which is a text and provides the adequate summary of the same. The research work presented in this paper describes an easy and effective method for speech recognition. The speech is converted to the corresponding text and produces summarized text. This has various applications like lecture notes creation, summarizing catalogues for lengthy documents and so on. Extensive experimentation is performed to validate the efficiency of the proposed metho
- …