201 research outputs found

    MidA is a putative methyltransferase that is required for mitochondrial complex I function

    Get PDF
    10 páginas, 6 figuras.-- et al.Dictyostelium and human MidA are homologous proteins that belong to a family of proteins of unknown function called DUF185. Using yeast two-hybrid screening and pull-down experiments, we showed that both proteins interact with the mitochondrial complex I subunit NDUFS2. Consistent with this, Dictyostelium cells lacking MidA showed a specific defect in complex I activity, and knockdown of human MidA in HEK293T cells resulted in reduced levels of assembled complex I. These results indicate a role for MidA in complex I assembly or stability. A structural bioinformatics analysis suggested the presence of a methyltransferase domain; this was further supported by site-directed mutagenesis of specific residues from the putative catalytic site. Interestingly, this complex I deficiency in a Dictyostelium midA- mutant causes a complex phenotypic outcome, which includes phototaxis and thermotaxis defects. We found that these aspects of the phenotype are mediated by a chronic activation of AMPK, revealing a possible role of AMPK signaling in complex I cytopathology.This work was supported by grants BMC2006-00394 and BMC2009-09050 to R.E. from the Spanish Ministerio de Ciencia e Innovación; to P.R.F. from the Thyne Reid Memorial Trusts and the Australian Research Council; to A.V. and O.G. from the Spanish National Bioinformatics Institute (www.inab.org), a platform of Genome Spain; to R.G. from the Fondo de Investigaciones Sanitarias, Instituto de Salud Carlos III, Spain (PI070167) and from the Comunidad de Madrid (GEN-0269/2006). S.C. is supported by a research contract from Consejería de Educación de la Comunidad de Madrid y del Fondo Social Europeo (FSE).Peer Reviewe

    An experimental study of the intrinsic stability of random forest variable importance measures

    Get PDF
    BACKGROUND: The stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among feature rankings in repeated runs of VIMs without data perturbations and parameter variations. Two widely used VIMs, i.e., Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) are comprehensively investigated. The motivation of this study is two-fold. First, we empirically verify the prevalence of intrinsic stability of VIMs over many real-world datasets to highlight that the instability of VIMs does not originate exclusively from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. Second, through Spearman and Pearson tests we comprehensively investigate how different factors influence the intrinsic stability. RESULTS: The experiments are carried out on 19 benchmark datasets with diverse characteristics, including 10 high-dimensional and small-sample gene expression datasets. Experimental results demonstrate the prevalence of intrinsic stability of VIMs. Spearman and Pearson tests on the correlations between intrinsic stability and different factors show that #feature (number of features) and #sample (size of sample) have a coupling effect on the intrinsic stability. The synthetic indictor, #feature/#sample, shows both negative monotonic correlation and negative linear correlation with the intrinsic stability, while OOB accuracy has monotonic correlations with intrinsic stability. This indicates that high-dimensional, small-sample and high complexity datasets may suffer more from intrinsic instability of VIMs. Furthermore, with respect to parameter settings of random forest, a large number of trees is preferred. No significant correlations can be seen between intrinsic stability and other factors. Finally, the magnitude of intrinsic stability is always smaller than that of traditional stability. CONCLUSION: First, the prevalence of intrinsic stability of VIMs demonstrates that the instability of VIMs not only comes from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. This finding gives a better understanding of VIM stability, and may help reduce the instability of VIMs. Second, by investigating the potential factors of intrinsic stability, users would be more aware of the risks and hence more careful when using VIMs, especially on high-dimensional, small-sample and high complexity datasets

    A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

    Get PDF
    Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

    Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

    Get PDF
    The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

    Quantum Pareto Optimal Control

    Full text link
    We describe algorithms, and experimental strategies, for the Pareto optimal control problem of simultaneously driving an arbitrary number of quantum observable expectation values to their respective extrema. Conventional quantum optimal control strategies are less effective at sampling points on the Pareto frontier of multiobservable control landscapes than they are at locating optimal solutions to single observable control problems. The present algorithms facilitate multiobservable optimization by following direct paths to the Pareto front, and are capable of continuously tracing the front once it is found to explore families of viable solutions. The numerical and experimental methodologies introduced are also applicable to other problems that require the simultaneous control of large numbers of observables, such as quantum optimal mixed state preparation.Comment: Submitted to Physical Review

    Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

    Get PDF
    Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development.We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data.THE FINDINGS OF THE PRESENT STUDY HAVE TWO IMPORTANT PRACTICAL IMPLICATIONS: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests

    Supplemental Information 6: Data S1.

    Get PDF
    Microorganisms that reside on and in mammals, such as bats, have the potential to influence their host’s health and to provide defenses against invading pathogens. However, we have little understanding of the skin and fur bacterial microbiota on bats, or factors that influence the structure of these communities. The southwestern United States offers excellent sites for the study of external bat bacterial microbiota due to the diversity of bat species, the variety of abiotic and biotic factors that may govern bat bacterial microbiota communities, and the lack of the newly emergent fungal disease in bats, white-nose syndrome (WNS), in the southwest. To test these variables, we used 16S rRNA gene 454 pyrosequencing from swabs of external skin and fur surfaces from 163 bats from 13 species sampled from southeastern New Mexico to northwestern Arizona. Community similarity patterns, random forest models, and generalized linear mixed-effects models show that factors such as location (e.g., cave-caught versus surface-netted) and ecoregion are major contributors to the structure of bacterial communities on bats. Bats caught in caves had a distinct microbial community compared to those that were netted on the surface. Our results provide a first insight into the distribution of skin and fur bat bacteria in the WNS-free environment of New Mexico and Arizona. More importantly, it provides a baseline of bat external microbiota that can be explored for potential natural defenses against pathogens
    corecore