10,459 research outputs found
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
A Machine Learning Approach for Plagiarism Detection
Plagiarism detection is gaining increasing importance due to requirements for integrity in education. The existing research has investigated the problem of plagrarim detection with a varying degree of success. The literature revealed that there are two main methods for detecting plagiarism, namely extrinsic and intrinsic.
This thesis has developed two novel approaches to address both of these methods. Firstly a novel extrinsic method for detecting plagiarism is proposed. The method is based on four well-known techniques namely Bag of Words (BOW), Latent Semantic Analysis (LSA), Stylometry and Support Vector Machines (SVM). The LSA application was fine-tuned to take in the stylometric features (most common words) in order to characterise the document authorship as described in chapter 4. The results revealed that LSA based stylometry has outperformed the traditional LSA application. Support vector machine based algorithms were used to perform the classification procedure in order to predict which author has written a particular book being tested. The proposed method has successfully addressed the limitations of semantic characteristics and identified the document source by assigning the book being tested to the right author in most cases.
Secondly, the intrinsic detection method has relied on the use of the statistical properties of the most common words. LSA was applied in this method to a group of most common words (MCWs) to extract their usage patterns based on the transitivity property of LSA. The feature sets of the intrinsic model were based on the frequency of the most common words, their relative frequencies in series, and the deviation of these frequencies across all books for a particular author.
The Intrinsic method aims to generate a model of author “style” by revealing a set of certain features of authorship. The model’s generation procedure focuses on just one author as an attempt to summarise aspects of an author’s style in a definitive and clear-cut manner.
The thesis has also proposed a novel experimental methodology for testing the performance of both extrinsic and intrinsic methods for plagiarism detection. This methodology relies upon the CEN (Corpus of English Novels) training dataset, but divides that dataset up into training and test datasets in a novel manner. Both approaches have been evaluated using the well-known leave-one-out-cross-validation method. Results indicated that by integrating deep analysis (LSA) and Stylometric analysis, hidden changes can be identified whether or not a reference collection exists
Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation
A human computation system can be viewed as a distributed system in which the
processors are humans, called workers. Such systems harness the cognitive power
of a group of workers connected to the Internet to execute relatively simple
tasks, whose solutions, once grouped, solve a problem that systems equipped
with only machines could not solve satisfactorily. Examples of such systems are
Amazon Mechanical Turk and the Zooniverse platform. A human computation
application comprises a group of tasks, each of them can be performed by one
worker. Tasks might have dependencies among each other. In this study, we
propose a theoretical framework to analyze such type of application from a
distributed systems point of view. Our framework is established on three
dimensions that represent different perspectives in which human computation
applications can be approached: quality-of-service requirements, design and
management strategies, and human aspects. By using this framework, we review
human computation in the perspective of programmers seeking to improve the
design of human computation applications and managers seeking to increase the
effectiveness of human computation infrastructures in running such
applications. In doing so, besides integrating and organizing what has been
done in this direction, we also put into perspective the fact that the human
aspects of the workers in such systems introduce new challenges in terms of,
for example, task assignment, dependency management, and fault prevention and
tolerance. We discuss how they are related to distributed systems and other
areas of knowledge.Comment: 3 figures, 1 tabl
Master of Science
thesisIt is common to extract isosurfaces from simulation eld data to visualize and gain understanding of the underlying physical phenomenon being simulated. As the input parameters of the simulation change, the resulting isosurface varies, and there has been increased interest in quantifying and visualization of these variations as part of the larger interest in uncertainty quantification. In this thesis, we propose an analysis and visualization pipeline for examining the intrinsic variation in isosurfaces caused by simulation parameter perturbation. Drawing inspiration from the shape modeling community, we incorporate the use of heat-kernel signatures (HKS) with a simple nite-difference approach for quantifying the degree to which a region (or even a point) on an isosurface has undergone intrinsic change. Coupled with a clustering technique and the use of color maps, our pipeline allows the user to select the level of fidelity with which they wish to evaluate and visualize the amount of intrinsic change. The pipeline is described with a simple example to walk the reader through the different steps, and experimental validation of parameter choices in the pipeline is provided to justify our design. Then we present canonical and simulation examples to demonstrate the pipeline's use in different applications
Stability and change of secondary school students' motivation profiles in mathematics: Effects of a student intervention.
There is high agreement that motivation is an important factor for successful learning processes and outcomes. But how do students differ in terms of motivation and how do these differences affect the effectiveness of a motivation intervention? As an intervention interacts with students' characteristics, students' heterogeneity must be considered and homogeneous intervention effects must be critically examined. This study aimed to identify motivation profiles of a specifically vulnerable student group, namely students in the lowest ability tier in the learning of mathematics. Within the framework of self-determination theory, we investigated how these profiles changed during Grade 7 and Grade 8. Furthermore, the study examined whether a particular intervention setting aimed at promoting positive emotions and motivation in learning had an impact on the patterns of change in the specific motivation profiles compared to students in the control condition. A latent profile analysis based on self-reported intrinsic, identified, introjected, and external regulation of 348 students revealed three motivation profiles, consisting of (a) low-mixed, (b) high-mixed, and (c) self-determined. Results of the latent transition analysis indicated that the majority of students tended to remain in the same profile and also revealed different effects of the intervention on different motivation profiles. The intervention seemed to be better tailored to students in the low-mixed motivation profile than to students in other profiles. This result highlights the nature of differential effects between students
Visualizing and Predicting the Effects of Rheumatoid Arthritis on Hands
This dissertation was inspired by difficult decisions patients of chronic diseases have to make about about treatment options in light of uncertainty. We look at rheumatoid arthritis (RA), a chronic, autoimmune disease that primarily affects the synovial joints of the hands and causes pain and deformities. In this work, we focus on several parts of a computer-based decision tool that patients can interact with using gestures, ask questions about the disease, and visualize possible futures. We propose a hand gesture based interaction method that is easily setup in a doctor\u27s office and can be trained using a custom set of gestures that are least painful. Our system is versatile and can be used for operations like simple selections to navigating a 3D world. We propose a point distribution model (PDM) that is capable of modeling hand deformities that occur due to RA and a generalized fitting method for use on radiographs of hands. Using our shape model, we show novel visualization of disease progression. Using expertly staged radiographs, we propose a novel distance metric learning and embedding technique that can be used to automatically stage an unlabeled radiograph. Given a large set of expertly labeled radiographs, our data-driven approach can be used to extract different modes of deformation specific to a disease
Heart Rate Variability Dynamics for the Prognosis of Cardiovascular Risk
Statistical, spectral, multi-resolution and non-linear methods were applied to heart rate variability (HRV) series linked with classification schemes for the prognosis of cardiovascular risk. A total of 90 HRV records were analyzed: 45 from healthy subjects and 45 from cardiovascular risk patients. A total of 52 features from all the analysis methods were evaluated using standard two-sample Kolmogorov-Smirnov test (KS-test). The results of the statistical procedure provided input to multi-layer perceptron (MLP) neural networks, radial basis function (RBF) neural networks and support vector machines (SVM) for data classification. These schemes showed high performances with both training and test sets and many combinations of features (with a maximum accuracy of 96.67%). Additionally, there was a strong consideration for breathing frequency as a relevant feature in the HRV analysis
- …