985 research outputs found
On the use of text classification methods for text summarisation
This thesis describes research work undertaken in the fields of text and questionnaire mining. More specifically, the research work is directed at the use of text classification techniques for the purpose of summarising the free text part of questionnaires. In this thesis text summarisation is conceived of as a form of text classification in that the classes assigned to text documents can be viewed as an indication (summarisation) of the main ideas of the original free text but in a coherent and reduced form. The reason for considering this type of summary is because summarising unstructured free text, such as that found in questionnaires, is not deemed to be effective using conventional text summarisation techniques. Four approaches are described in the context of the classification summarisation of free text from different sources, focused on the free text part of questionnaires. The first approach considers the use of standard classification techniques for text summarisation and was motivated by the desire to establish a benchmark with which the more specialised summarisation classification techniques presented later in this thesis could be compared. The second approach, called Classifier Generation Using Secondary Data (CGUSD), addresses the case when the available data is not considered sufficient for training purposes (or possibly because no data is available at all). The third approach, called Semi-Automated Rule Summarisation Extraction Tool (SARSET), presents a semi-automated classification technique to support document summarisation classification in which there is more involvement by the domain experts in the classifier generation process, the idea was that this might serve to produce more effective summaries. The fourth is a hierarchical summarisation classification approach which assumes that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. For evaluation purposes three types of text were considered: (i) questionnaire free text, (ii) text from medical abstracts and (iii) text from news stories
Computational acquisition of knowledge in small-data environments: a case study in the field of energetics
The UK’s defence industry is accelerating its implementation of artificial intelligence, including
expert systems and natural language processing (NLP) tools designed to supplement human
analysis. This thesis examines the limitations of NLP tools in small-data environments (common
in defence) in the defence-related energetic-materials domain. A literature review identifies
the domain-specific challenges of developing an expert system (specifically an ontology). The
absence of domain resources such as labelled datasets and, most significantly, the preprocessing
of text resources are identified as challenges. To address the latter, a novel general-purpose
preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The
effectiveness of the pipeline is evaluated.
Examination of the interface between using NLP tools in data-limited environments to either
supplement or replace human analysis completely is conducted in a study examining the subjective
concept of importance. A methodology for directly comparing the ability of NLP tools
and experts to identify important points in the text is presented. Results show the participants
of the study exhibit little agreement, even on which points in the text are important. The NLP,
expert (author of the text being examined) and participants only agree on general statements.
However, as a group, the participants agreed with the expert. In data-limited environments,
the extractive-summarisation tools examined cannot effectively identify the important points
in a technical document akin to an expert.
A methodology for the classification of journal articles by the technology readiness level (TRL)
of the described technologies in a data-limited environment is proposed. Techniques to overcome
challenges with using real-world data such as class imbalances are investigated. A methodology
to evaluate the reliability of human annotations is presented. Analysis identifies a lack of
agreement and consistency in the expert evaluation of document TRL.Open Acces
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Crowdsourced intuitive visual design feedback
For many people images are a medium preferable to text and yet, with the exception of
star ratings, most formats for conventional computer mediated feedback focus on text.
This thesis develops a new method of crowd feedback for designers based on images.
Visual summaries are generated from a crowd’s feedback images chosen in response to
a design. The summaries provide the designer with impressionistic and inspiring visual
feedback. The thesis sets out the motivation for this new method, describes the
development of perceptually organised image sets and a summarisation algorithm to
implement it. Evaluation studies are reported which, through a mixed methods
approach, provide evidence of the validity and potential of the new image-based
feedback method.
It is concluded that the visual feedback method would be more appealing than text for
that section of the population who may be of a visual cognitive style. Indeed the
evaluation studies are evidence that such users believe images are as good as text when
communicating their emotional reaction about a design. Designer participants reported
being inspired by the visual feedback where, comparably, they were not inspired by
text. They also reported that the feedback can represent the perceived mood in their
designs, and that they would be enthusiastic users of a service offering this new form of
visual design feedback
Recommended from our members
User-centred video abstraction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThe rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Automatic movie analysis and summarisation
Automatic movie analysis is the task of employing Machine Learning methods to the
field of screenplays, movie scripts, and motion pictures to facilitate or enable various
tasks throughout the entirety of a movie’s life-cycle. From helping with making
informed decisions about a new movie script with respect to aspects such as its originality,
similarity to other movies, or even commercial viability, all the way to offering
consumers new and interesting ways of viewing the final movie, many stages in the
life-cycle of a movie stand to benefit from Machine Learning techniques that promise
to reduce human effort, time, or both. Within this field of automatic movie analysis,
this thesis addresses the task of summarising the content of screenplays, enabling users
at any stage to gain a broad understanding of a movie from greatly reduced data. The
contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale
data set of original movie scripts, annotated with additional meta-information such as
genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script-
Base is the largest data set of its kind, containing scripts and information for almost
1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the
screenplay domain, which allows for extraction of highly informative and important
scenes from movie scripts. The extracted summaries allow for the content of the original
script to stay largely intact and provide the user with its important parts, while
greatly reducing the script-reading time. (iii) We extend our summarisation model
to capture additional modalities beyond the screenplay text. The model is rendered
multi-modal by introducing visual information obtained from the actual movie and by
extracting scenes from the movie, allowing users to generate visual summaries of motion
pictures. (iv) We devise a novel end-to-end neural network model for generating
natural language screenplay overviews. This model enables the user to generate short
descriptive and informative texts that capture certain aspects of a movie script, such as
its genres, approximate content, or style, allowing them to gain a fast, high-level understanding
of the screenplay. Multiple automatic and human evaluations were carried
out to assess the performance of our models, demonstrating that they are well-suited
for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the
ScriptBase data set has started to gain traction, and is currently used by a number of
other researchers in the field to tackle various tasks relating to screenplays and their
analysis
Using social semantic knowledge to improve annotations in personal photo collections
Instituto Politécnico de Lisboa (IPL) e Instituto Superior de Engenharia de Lisboa (ISEL)apoio concedido pela bolsa SPRH/PROTEC/67580/2010, que apoiou parcialmente este trabalh
Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis
Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development
- …