5,218 research outputs found

    Multi-argument classification for semantic role labeling

    Get PDF
    This paper describes a Multi-Argument Classification (MAC) approach to Semantic Role Labeling. The goal is to exploit dependencies between semantic roles by simultaneously classifying all arguments as a pattern. Argument identification, as a pre-processing stage, is carried at using the improved Predicate-Argument Recognition Algorithm (PARA) developed by Lin and Smith (2006). Results using standard evaluation metrics show that multi-argument classification, archieving 76.60 in F₁ measurement on WSJ 23, outperforms existing systems that use a single parse tree for the CoNLL 2005 shared task data. This paper also describes ways to significantly increase the speed of multi-argument classification, making it suitable for real-time language processing tasks that require semantic role labelling

    Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression

    Get PDF
    This is the Author's Original Manuscript of an article whose final and definitive form, the Version of Record, has been published in the Journal of New Music Research, November 2011, copyright Taylor & Francis. The published article is available online at http://www.tandfonline.com/10.1080/09298215.2011.596938

    XLearn : learning activity labels across heterogeneous datasets

    Get PDF
    Sensor-driven systems often need to map sensed data into meaningfully labelled activities to classify the phenomena being observed. A motivating and challenging example comes from human activity recognition in which smart home and other datasets are used to classify human activities to support applications such as ambient assisted living, health monitoring, and behavioural intervention. Building a robust and meaningful classifier needs annotated ground truth, labelled with what activities are actually being observed—and acquiring high-quality, detailed, continuous annotations remains a challenging, time-consuming, and error-prone task, despite considerable attention in the literature. In this article, we use knowledge-driven ensemble learning to develop a technique that can combine classifiers built from individually labelled datasets, even when the labels are sparse and heterogeneous. The technique both relieves individual users of the burden of annotation and allows activities to be learned individually and then transferred to a general classifier. We evaluate our approach using four third-party, real-world smart home datasets and show that it enhances activity recognition accuracies even when given only a very small amount of training data.PostprintPeer reviewe

    Data mining based cyber-attack detection

    Get PDF

    A simple way to estimate similarity between pairs of eye movement sequences

    Get PDF
    We propose a novel algorithm to estimate the similarity between a pair of eye movement sequences. The proposed algorithm relies on a straight-forward geometric representation of eye movement data. The algorithm is considerably simpler to implement and apply than existing similarity measures, and is particularly suited for exploratory analyses. To validate the algorithm, we conducted a benchmark experiment using realistic artificial eye movement data. Based on similarity ratings obtained from the proposed algorithm, we defined two clusters in an unlabelled set of eye movement sequences. As a measure of the algorithm's sensitivity, we quantified the extent to which these data-driven clusters matched two pre-defined groups (i.e., the 'real' clusters). The same analysis was performed using two other, commonly used similarity measures. The results show that the proposed algorithm is a viable similarity measure

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

    Large-scale inference in the focally damaged human brain

    Get PDF
    Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings

    An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning

    Full text link
    Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here on the case of time series data that can ultimately be modelled as a spatially distributed system (e.g. a partial differential equation, PDE), but where we do not know the space in which this PDE should be formulated. Hence, even the spatial coordinates for the distributed system themselves need to be identified - to emerge from - the data mining process. We will first validate this emergent space reconstruction for time series sampled without space labels in known PDEs; this brings up the issue of observability of physical space from temporal observation data, and the transition from spatially resolved to lumped (order-parameter-based) representations by tuning the scale of the data mining kernels. We will then present actual emergent space discovery illustrations. Our illustrative examples include chimera states (states of coexisting coherent and incoherent dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics, arising in partial differential equations and/or in heterogeneous networks. We also discuss how data-driven spatial coordinates can be extracted in ways invariant to the nature of the measuring instrument. Such gauge-invariant data mining can go beyond the fusion of heterogeneous observations of the same system, to the possible matching of apparently different systems
    corecore