59,814 research outputs found
Identification of disease-causing genes using microarray data mining and gene ontology
Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes.
Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results.
Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth.
Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers
Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset
Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics
Degeneracy: a design principle for achieving robustness and evolvability
Robustness, the insensitivity of some of a biological system's
functionalities to a set of distinct conditions, is intimately linked to
fitness. Recent studies suggest that it may also play a vital role in enabling
the evolution of species. Increasing robustness, so is proposed, can lead to
the emergence of evolvability if evolution proceeds over a neutral network that
extends far throughout the fitness landscape. Here, we show that the design
principles used to achieve robustness dramatically influence whether robustness
leads to evolvability. In simulation experiments, we find that purely redundant
systems have remarkably low evolvability while degenerate, i.e. partially
redundant, systems tend to be orders of magnitude more evolvable. Surprisingly,
the magnitude of observed variation in evolvability can neither be explained by
differences in the size nor the topology of the neutral networks. This suggests
that degeneracy, a ubiquitous characteristic in biological systems, may be an
important enabler of natural evolution. More generally, our study provides
valuable new clues about the origin of innovations in complex adaptive systems.Comment: Accepted in the Journal of Theoretical Biology (Nov 2009
Effective Discriminative Feature Selection with Non-trivial Solutions
Feature selection and feature transformation, the two main ways to reduce
dimensionality, are often presented separately. In this paper, a feature
selection method is proposed by combining the popular transformation based
dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity
regularization. We impose row sparsity on the transformation matrix of LDA
through -norm regularization to achieve feature selection, and
the resultant formulation optimizes for selecting the most discriminative
features and removing the redundant ones simultaneously. The formulation is
extended to the -norm regularized case: which is more likely to
offer better sparsity when . Thus the formulation is a better
approximation to the feature selection problem. An efficient algorithm is
developed to solve the -norm based optimization problem and it is
proved that the algorithm converges when . Systematical experiments
are conducted to understand the work of the proposed method. Promising
experimental results on various types of real-world data sets demonstrate the
effectiveness of our algorithm
Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images
We study filter–based feature selection methods for classification of biomedical images. For feature selection, we use two filters — a relevance filter which measures usefulness of individual features for target prediction, and a redundancy filter, which measures similarity between features. As selection method that combines relevance and redundancy we try out a Hopfield network. We experimentally compare selection methods, running unitary redundancy and relevance filters, against a greedy algorithm with redundancy thresholds [9], the min-redundancy max-relevance integration [8,23,36], and our Hopfield network selection. We conclude that on the whole, Hopfield selection was one of the most successful methods, outperforming min-redundancy max-relevance when\ud
more features are selected
Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality
In an era where accumulating data is easy and storing it inexpensive, feature
selection plays a central role in helping to reduce the high-dimensionality of
huge amounts of otherwise meaningless data. In this paper, we propose a
graph-based method for feature selection that ranks features by identifying the
most important ones into arbitrary set of cues. Mapping the problem on an
affinity graph-where features are the nodes-the solution is given by assessing
the importance of nodes through some indicators of centrality, in particular,
the Eigen-vector Centrality (EC). The gist of EC is to estimate the importance
of a feature as a function of the importance of its neighbors. Ranking central
nodes individuates candidate features, which turn out to be effective from a
classification point of view, as proved by a thoroughly experimental section.
Our approach has been tested on 7 diverse datasets from recent literature
(e.g., biological data and object recognition, among others), and compared
against filter, embedded and wrappers methods. The results are remarkable in
terms of accuracy, stability and low execution time.Comment: Preprint version - Lecture Notes in Computer Science - Springer 201
Networked buffering: a basic mechanism for distributed robustness in complex adaptive systems
A generic mechanism - networked buffering - is proposed for the generation of robust traits in complex systems. It requires two basic conditions to be satisfied: 1) agents are versatile enough to perform more than one single functional role within a system and 2) agents are degenerate, i.e. there exists partial overlap in the functional capabilities of agents. Given these prerequisites, degenerate systems can readily produce a distributed systemic response to local perturbations. Reciprocally, excess resources related to a single function can indirectly support multiple unrelated functions within a degenerate system. In models of genome:proteome mappings for which localized decision-making and modularity of genetic functions are assumed, we verify that such distributed compensatory effects cause enhanced robustness of system traits. The conditions needed for networked buffering to occur are neither demanding nor rare, supporting the conjecture that degeneracy may fundamentally underpin distributed robustness within several biotic and abiotic systems. For instance, networked buffering offers new insights into systems engineering and planning activities that occur under high uncertainty. It may also help explain recent developments in understanding the origins of resilience within complex ecosystems. \ud
\u
- …