19,408 research outputs found
Identification of disease-causing genes using microarray data mining and gene ontology
Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes.
Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth.
Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers
Efficient inference for genetic association studies with multiple outcomes
Combined inference for heterogeneous high-dimensional data is critical in
modern biology, where clinical and various kinds of molecular data may be
available from a single study. Classical genetic association studies regress a
single clinical outcome on many genetic variants one by one, but there is an
increasing demand for joint analysis of many molecular outcomes and genetic
variants in order to unravel functional interactions. Unfortunately, most
existing approaches to joint modelling are either too simplistic to be powerful
or are impracticable for computational reasons. Inspired by Richardson et al.
(2010, Bayesian Statistics 9), we consider a sparse multivariate regression
model that allows simultaneous selection of predictors and associated
responses. As Markov chain Monte Carlo (MCMC) inference on such models can be
prohibitively slow when the number of genetic variants exceeds a few thousand,
we propose a variational inference approach which produces posterior
information very close to that of MCMC inference, at a much reduced
computational cost. Extensive numerical experiments show that our approach
outperforms popular variable selection methods and tailored Bayesian
procedures, dealing within hours with problems involving hundreds of thousands
of genetic variants and tens to hundreds of clinical or molecular outcomes
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
A Novel Multiobjective Cell Switch-Off Framework for Cellular Networks
Cell Switch-Off (CSO) is recognized as a promising approach to reduce the
energy consumption in next-generation cellular networks. However, CSO poses
serious challenges not only from the resource allocation perspective but also
from the implementation point of view. Indeed, CSO represents a difficult
optimization problem due to its NP-complete nature. Moreover, there are a
number of important practical limitations in the implementation of CSO schemes,
such as the need for minimizing the real-time complexity and the number of
on-off/off-on transitions and CSO-induced handovers. This article introduces a
novel approach to CSO based on multiobjective optimization that makes use of
the statistical description of the service demand (known by operators). In
addition, downlink and uplink coverage criteria are included and a comparative
analysis between different models to characterize intercell interference is
also presented to shed light on their impact on CSO. The framework
distinguishes itself from other proposals in two ways: 1) The number of
on-off/off-on transitions as well as handovers are minimized, and 2) the
computationally-heavy part of the algorithm is executed offline, which makes
its implementation feasible. The results show that the proposed scheme achieves
substantial energy savings in small cell deployments where service demand is
not uniformly distributed, without compromising the Quality-of-Service (QoS) or
requiring heavy real-time processing
- …