1,262 research outputs found

    Classification and feature selection algorithms for multi-class CGH data

    Get PDF
    Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains

    Remote sensing in support of conservation and management of heathland vegetation

    Get PDF

    SCALABLE ALGORITHMS FOR HIGH DIMENSIONAL STRUCTURED DATA

    Get PDF
    Emerging technologies and digital devices provide us with increasingly large volume of data with respect to both the sample size and the number of features. To explore the benefits of massive data sets, scalable statistical models and machine learning algorithms are more and more important in different research disciplines. For robust and accurate prediction, prior knowledge regarding dependency structures within data needs to be formulated appropriately in these models. On the other hand, scalability and computation complexity of existing algorithms may not meet the needs to analyze massive high-dimensional data. This dissertation presents several novel methods to scale up sparse learning models to analyze massive data sets. We first present our novel safe active incremental feature (SAIF) selection algorithm for LASSO (least absolute shrinkage and selection operator), with the time complexity analysis to show the advantages over state of the art existing methods. As SAIF is targeting general convex loss functions, it potentially can be extended to many learning models and big-data applications, and we show how support vector machines (SVM) can be scaled up based on the idea of SAIF. Secondly, we propose screening methods to generalized LASSO (GL), which specifically considers the dependency structure among features. We also propose a scalable feature selection method for non-parametric, non-linear models based on sparse structures and kernel methods. Theoretical analysis and experimental results in this dissertation show that model complexity can be significantly reduced with the sparsity and structure assumptions

    Combination Strategies for Semantic Role Labeling

    Full text link
    This paper introduces and analyzes a battery of inference models for the problem of semantic role labeling: one based on constraint satisfaction, and several strategies that model the inference as a meta-learning problem using discriminative classifiers. These classifiers are developed with a rich set of novel features that encode proposition and sentence-level information. To our knowledge, this is the first work that: (a) performs a thorough analysis of learning-based inference models for semantic role labeling, and (b) compares several inference strategies in this context. We evaluate the proposed inference strategies in the framework of the CoNLL-2005 shared task using only automatically-generated syntactic information. The extensive experimental evaluation and analysis indicates that all the proposed inference strategies are successful -they all outperform the current best results reported in the CoNLL-2005 evaluation exercise- but each of the proposed approaches has its advantages and disadvantages. Several important traits of a state-of-the-art SRL combination strategy emerge from this analysis: (i) individual models should be combined at the granularity of candidate arguments rather than at the granularity of complete solutions; (ii) the best combination strategy uses an inference model based in learning; and (iii) the learning-based inference benefits from max-margin classifiers and global feedback
    corecore