1,262 research outputs found
Classification and feature selection algorithms for multi-class CGH data
Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains
SCALABLE ALGORITHMS FOR HIGH DIMENSIONAL STRUCTURED DATA
Emerging technologies and digital devices provide us with increasingly large volume
of data with respect to both the sample size and the number of features. To explore the benefits of massive data sets, scalable statistical models and machine learning algorithms are more and more important in different research disciplines. For robust and accurate prediction, prior knowledge regarding dependency structures within data needs to be formulated appropriately in these models. On the other hand, scalability and computation complexity of existing algorithms may not meet the needs to analyze massive high-dimensional data. This dissertation presents several novel methods to scale up sparse learning models to analyze massive data sets. We first present our novel safe active incremental feature (SAIF) selection algorithm for LASSO (least absolute shrinkage and selection operator), with the time complexity analysis to show the advantages over state of the art existing methods. As SAIF is targeting general convex loss functions, it potentially can be extended to many learning models and big-data applications, and we show how support vector machines (SVM) can be scaled up based on the idea of SAIF. Secondly, we propose screening methods to generalized LASSO (GL), which specifically considers the dependency structure among features. We also propose a scalable feature selection method for non-parametric, non-linear models based on sparse structures and kernel methods. Theoretical analysis and
experimental results in this dissertation show that model complexity can be significantly
reduced with the sparsity and structure assumptions
Combination Strategies for Semantic Role Labeling
This paper introduces and analyzes a battery of inference models for the
problem of semantic role labeling: one based on constraint satisfaction, and
several strategies that model the inference as a meta-learning problem using
discriminative classifiers. These classifiers are developed with a rich set of
novel features that encode proposition and sentence-level information. To our
knowledge, this is the first work that: (a) performs a thorough analysis of
learning-based inference models for semantic role labeling, and (b) compares
several inference strategies in this context. We evaluate the proposed
inference strategies in the framework of the CoNLL-2005 shared task using only
automatically-generated syntactic information. The extensive experimental
evaluation and analysis indicates that all the proposed inference strategies
are successful -they all outperform the current best results reported in the
CoNLL-2005 evaluation exercise- but each of the proposed approaches has its
advantages and disadvantages. Several important traits of a state-of-the-art
SRL combination strategy emerge from this analysis: (i) individual models
should be combined at the granularity of candidate arguments rather than at the
granularity of complete solutions; (ii) the best combination strategy uses an
inference model based in learning; and (iii) the learning-based inference
benefits from max-margin classifiers and global feedback
- …