5,218 research outputs found
Multi-argument classification for semantic role labeling
This paper describes a Multi-Argument Classification (MAC) approach to Semantic Role Labeling. The goal is to exploit dependencies between semantic roles by simultaneously classifying all arguments as a pattern. Argument identification, as a pre-processing stage, is carried at using the improved Predicate-Argument Recognition Algorithm (PARA) developed by Lin and Smith (2006). Results using standard evaluation metrics show that multi-argument classification, archieving 76.60 in F₁ measurement on WSJ 23, outperforms existing systems that use a single parse tree for the CoNLL 2005 shared task data. This paper also describes ways to significantly increase the speed of multi-argument classification, making it suitable for real-time language processing tasks that require semantic role labelling
Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression
This is the Author's Original Manuscript of an article whose final and definitive form, the Version of Record, has been published in the Journal of New Music Research, November 2011, copyright Taylor & Francis. The published article is available online at http://www.tandfonline.com/10.1080/09298215.2011.596938
XLearn : learning activity labels across heterogeneous datasets
Sensor-driven systems often need to map sensed data into meaningfully labelled activities to classify the phenomena being observed. A motivating and challenging example comes from human activity recognition in which smart home and other datasets are used to classify human activities to support applications such as ambient assisted living, health monitoring, and behavioural intervention. Building a robust and meaningful classifier needs annotated ground truth, labelled with what activities are actually being observed—and acquiring high-quality, detailed, continuous annotations remains a challenging, time-consuming, and error-prone task, despite considerable attention in the literature. In this article, we use knowledge-driven ensemble learning to develop a technique that can combine classifiers built from individually labelled datasets, even when the labels are sparse and heterogeneous. The technique both relieves individual users of the burden of annotation and allows activities to be learned individually and then transferred to a general classifier. We evaluate our approach using four third-party, real-world smart home datasets and show that it enhances activity recognition accuracies even when given only a very small amount of training data.PostprintPeer reviewe
A simple way to estimate similarity between pairs of eye movement sequences
We propose a novel algorithm to estimate the similarity between a pair of eye movement sequences. The proposed algorithm relies on a straight-forward geometric representation of eye movement data. The algorithm is considerably simpler to implement and apply than existing similarity measures, and is particularly suited for exploratory analyses. To validate the algorithm, we conducted a benchmark experiment using realistic artificial eye movement data. Based on similarity ratings obtained from the proposed algorithm, we defined two clusters in an unlabelled set of eye movement sequences. As a measure of the algorithm's sensitivity, we quantified the extent to which these data-driven clusters matched two pre-defined groups (i.e., the 'real' clusters). The same analysis was performed using two other, commonly used similarity measures. The results show that the proposed algorithm is a viable similarity measure
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Large-scale inference in the focally damaged human brain
Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings
An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning
Manifold-learning techniques are routinely used in mining complex
spatiotemporal data to extract useful, parsimonious data
representations/parametrizations; these are, in turn, useful in nonlinear model
identification tasks. We focus here on the case of time series data that can
ultimately be modelled as a spatially distributed system (e.g. a partial
differential equation, PDE), but where we do not know the space in which this
PDE should be formulated. Hence, even the spatial coordinates for the
distributed system themselves need to be identified - to emerge from - the data
mining process. We will first validate this emergent space reconstruction for
time series sampled without space labels in known PDEs; this brings up the
issue of observability of physical space from temporal observation data, and
the transition from spatially resolved to lumped (order-parameter-based)
representations by tuning the scale of the data mining kernels. We will then
present actual emergent space discovery illustrations. Our illustrative
examples include chimera states (states of coexisting coherent and incoherent
dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics,
arising in partial differential equations and/or in heterogeneous networks. We
also discuss how data-driven spatial coordinates can be extracted in ways
invariant to the nature of the measuring instrument. Such gauge-invariant data
mining can go beyond the fusion of heterogeneous observations of the same
system, to the possible matching of apparently different systems
- …