598 research outputs found
Feature selection using Haar wavelet power spectrum
BACKGROUND: Feature selection is an approach to overcome the 'curse of dimensionality' in complex researches like disease classification using microarrays. Statistical methods are utilized more in this domain. Most of them do not fit for a wide range of datasets. The transform oriented signal processing domains are not probed much when other fields like image and video processing utilize them well. Wavelets, one of such techniques, have the potential to be utilized in feature selection method. The aim of this paper is to assess the capability of Haar wavelet power spectrum in the problem of clustering and gene selection based on expression data in the context of disease classification and to propose a method based on Haar wavelet power spectrum. RESULTS: Haar wavelet power spectra of genes were analysed and it was observed to be different in different diagnostic categories. This difference in trend and magnitude of the spectrum may be utilized in gene selection. Most of the genes selected by earlier complex methods were selected by the very simple present method. Each earlier works proved only few genes are quite enough to approach the classification problem [1]. Hence the present method may be tried in conjunction with other classification methods. The technique was applied without removing the noise in data to validate the robustness of the method against the noise or outliers in the data. No special softwares or complex implementation is needed. The qualities of the genes selected by the present method were analysed through their gene expression data. Most of them were observed to be related to solve the classification issue since they were dominant in the diagnostic category of the dataset for which they were selected as features. CONCLUSION: In the present paper, the problem of feature selection of microarray gene expression data was considered. We analyzed the wavelet power spectrum of genes and proposed a clustering and feature selection method useful for classification based on Haar wavelet power spectrum. Application of this technique in this area is novel, simple, and faster than other methods, fit for a wide range of data types. The results are encouraging and throw light into the possibility of using this technique for problem domains like disease classification, gene network identification and personalized drug design
Elastic-Net Regularization in Learning Theory
Within the framework of statistical learning theory we analyze in detail the
so-called elastic-net regularization scheme proposed by Zou and Hastie for the
selection of groups of correlated variables. To investigate on the statistical
properties of this scheme and in particular on its consistency properties, we
set up a suitable mathematical framework. Our setting is random-design
regression where we allow the response variable to be vector-valued and we
consider prediction functions which are linear combination of elements ({\em
features}) in an infinite-dimensional dictionary. Under the assumption that the
regression function admits a sparse representation on the dictionary, we prove
that there exists a particular ``{\em elastic-net representation}'' of the
regression function such that, if the number of data increases, the elastic-net
estimator is consistent not only for prediction but also for variable/feature
selection. Our results include finite-sample bounds and an adaptive scheme to
select the regularization parameter. Moreover, using convex analysis tools, we
derive an iterative thresholding algorithm for computing the elastic-net
solution which is different from the optimization procedure originally proposed
by Zou and HastieComment: 32 pages, 3 figure
A novel neural network approach to cDNA microarray image segmentation
This is the post-print version of the Article. The official published version can be accessed from the link below. Copyright @ 2013 Elsevier.Microarray technology has become a great source of information for biologists to understand the workings of DNA which is one of the most complex codes in nature. Microarray images typically contain several thousands of small spots, each of which represents a different gene in the experiment. One of the key steps in extracting information from a microarray image is the segmentation whose aim is to identify which pixels within an image represent which gene. This task is greatly complicated by noise within the image and a wide degree of variation in the values of the pixels belonging to a typical spot. In the past there have been many methods proposed for the segmentation of microarray image. In this paper, a new method utilizing a series of artificial neural networks, which are based on multi-layer perceptron (MLP) and Kohonen networks, is proposed. The proposed method is applied to a set of real-world cDNA images. Quantitative comparisons between the proposed method and commercial software GenePix(®) are carried out in terms of the peak signal-to-noise ratio (PSNR). This method is shown to not only deliver results comparable and even superior to existing techniques but also have a faster run time.This work was funded in part by the National Natural Science Foundation of China under Grants 61174136 and 61104041, the Natural Science Foundation of Jiangsu Province of China under Grant BK2011598, the International Science and Technology Cooperation Project of China under Grant No. 2011DFA12910, the Engineering and Physical Sciences Research Council (EPSRC) of the U.K. under Grant GR/S27658/01, the Royal Society of the U.K., and the Alexander von Humboldt Foundation of Germany
Learning the Structure for Structured Sparsity
Structured sparsity has recently emerged in statistics, machine learning and
signal processing as a promising paradigm for learning in high-dimensional
settings. All existing methods for learning under the assumption of structured
sparsity rely on prior knowledge on how to weight (or how to penalize)
individual subsets of variables during the subset selection process, which is
not available in general. Inferring group weights from data is a key open
research problem in structured sparsity.In this paper, we propose a Bayesian
approach to the problem of group weight learning. We model the group weights as
hyperparameters of heavy-tailed priors on groups of variables and derive an
approximate inference scheme to infer these hyperparameters. We empirically
show that we are able to recover the model hyperparameters when the data are
generated from the model, and we demonstrate the utility of learning weights in
synthetic and real denoising problems
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
A variable screening procedure via correlation learning was proposed Fan and
Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models.
Even when the true model is linear, the marginal regression can be highly
nonlinear. To address this issue, we further extend the correlation learning to
marginal nonparametric learning. Our nonparametric independence screening is
called NIS, a specific member of the sure independence screening. Several
closely related variable screening procedures are proposed. Under the
nonparametric additive models, it is shown that under some mild technical
conditions, the proposed independence screening methods enjoy a sure screening
property. The extent to which the dimensionality can be reduced by independence
screening is also explicitly quantified. As a methodological extension, an
iterative nonparametric independence screening (INIS) is also proposed to
enhance the finite sample performance for fitting sparse additive models. The
simulation results and a real data analysis demonstrate that the proposed
procedure works well with moderate sample size and large dimension and performs
better than competing methods.Comment: 48 page
Wavelet-Based Cancer Drug Recommender System
A natureza molecular do cancro serve de base para estudos sistemáticos de genomas
cancerígenos, fornecendo valiosos insights e permitindo o desenvolvimento de
tratamentos clínicos. Acima de tudo, estes estudos estão a impulsionar o uso clínico de
informação genómica na escolha de tratamentos, de outro modo não expectáveis, em
pacientes com diversos tipos de cancro, possibilitando a medicina de precisão.
Com isso em mente, neste projeto combinamos técnicas de processamento de imagem,
para aprimoramento de dados, e sistemas de recomendação para propor um ranking
personalizado de drogas anticancerígenas. O sistema é implementado em Python e testado
usando uma base de dados que contém registos de sensibilidade a drogas, com mais de
310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas
anticancerígenas em 987 linhas celulares cancerígenas.
Após várias tarefas de pré-processamento, são realizadas duas experiências. A primeira
experiência usa as imagens originais de microarrays de DNA e a segunda usa as mesmas
imagens, mas submetidas a uma transformada wavelet. As experiências confirmam que
as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o
desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares
cancerígenas com perfil semelhante ao da nova linha celular.
Além disso, concluímos que as imagens de microarrays de DNA com transformadas de
wavelet apropriadas, não apenas fornecem informações mais ricas para a pesquisa de
utilizadores similares, mas também comprimem essas imagens com eficiência,
otimizando os recursos computacionais.
Tanto quanto é do nosso conhecimento, este projeto é inovador no que diz respeito ao uso
de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar
linhas celulares num sistema de recomendação personalizado de drogas anticancerígenas
- …