6 research outputs found

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    GENOME WIDE DISCOVERY OF DISEASE MODIFIERS

    Get PDF
    Disease modifiers are genes that when activated can alter the expression of a phenotype associated with a disease. This can be done directly through affecting the expression of another gene that is causing the disease, or indirectly by affecting other factors that contribute to the phenotype’s variability. Identification of disease modifiers is of great interest from both treatment and genetic counseling perspectives. We set here to develop computational approaches to identify and study disease modifiers. We focus on two research avenues for studying disease modifiers: (1) One aimed at identifying and investigating modifiers of cancer, a complex disease influenced by multiple genetic and environmental factors, and (2) the other focuses on the identification of disease modifiers for monogenetic disorders which involve a single disease causing gene. Towards the first aim of studying cancer modifiers we take four complimentary approaches. (a) First, we developed a computational approach to identify metabolic drivers of cancer that when applied to colorectal cancer, successfully identified FUT9 as a gene that strongly modifies tumors aggressiveness. (b) Second, to study metabolic pathway-level modifications in cancer, we developed an algorithm that summarizes cancer modifications to generate pathway compositions that best capture cancer associated alterations, which, as we show, enhances cancer classification and survival prediction. (c) Third, to identify modifiers of cancer immunotherapy treatment, we developed a new computational approach that robustly predicts the response to immune checkpoint blockage therapy. (d) Fourth, to identify modifiers of cancer radiotherapy treatment we built a robust predictor of rectal cancer patients’ response to chemo-radiation-therapy (CRT), identifying a signature of genes that may serve a potential targets for modifying patients’ response to CRT. Towards the second aim of studying genetic modifiers of Mendelian diseases, we developed a computational approach for identifying a specific expression pattern associated with genes that are modifying disease severity. We show that we can successfully prioritize genes that are modifying disease severity in cystic fibrosis and spinal muscular atrophy, where we have identified a new modifier and validated it experimentally. As will become evident from reading my dissertation, my work has naturally focused on developing a variety of computational approaches to analyze research questions that were of interest to me. Obviously, my work has greatly benefited and has been significantly enriched by close collaboration with many experimental labs that have kindly embarked on testing the predictions made, and to whom I am indebted. In sum, we developed methods to identify and study disease modifiers for both cancer and Mendelian diseases. The applications of these methods generates a few promising leads for advancing the treatment for these diseases and improving clinical decision-making

    Biclustering sobre datos de expresión génica basado en búsqueda dispersa

    Get PDF
    Falta palabras claveLos datos de expresión génica, y su particular naturaleza e importancia, motivan no sólo el desarrollo de nuevas técnicas sino la formulación de nuevos problemas como el problema del biclustering. El biclustering es una técnica de aprendizaje no supervisado que agrupa tanto genes como condiciones. Este doble agrupamiento lo diferencia del clustering tradicional sobre este tipo de datos ya que éste sólo agrupa o bien genes o condiciones. La presente tesis presenta un nuevo algoritmo de biclustering que permite el estudio de distintos criterios de búsqueda. Dicho algoritmo utiliza esquema de búsqueda dispersa, o scatter search, que independiza el mecanismo de búsqueda del criterio empleado. Se han estudiado tres criterios de búsqueda diferentes que motivan las tres principales aportaciones de la tesis. En primer lugar se estudia la correlación lineal entre los genes, que se integra como parte de la función objetivo empleada por el algoritmo de biclustering. La correlación lineal permite encontrar biclusters con patrones de desplazamiento y escalado, lo que mejora propuestas anteriores. En segundo lugar, y motivado por el significado biológico de los patrones de activación-inhibición entre genes, se modifica la correlación lineal de manera que se contemplen estos patrones. Por último, se ha tenido en cuenta la información disponible sobre genes en repositorios públicos, como la ontología de genes GO, y se incorpora dicha información como parte del criterio de búsqueda. Se añade un término extra que refleja, por cada bicluster que se evalúe, la calidad de ese grupo de genes según su información almacenada en GO. Se estudian dos posibilidades para dicho término de integración de información biológica, se comparan entre sí y se comprueba que los resultados son mejores cuando se usa información biológica en el algoritmo de biclustering. Las tres aportaciones descritas, junto con una serie de pasos intermedios, han dado lugar a resultados publicados tanto en revistas como en conferencias nacionales e internacionales

    Computational methods for breast cancer diagnosis, prognosis, and treatment prediction

    Full text link
    The research presented here develops a robust reliability algorithm for the identification of reliable protein interactions that can be incorporated with a gene expression dataset to improve the algorithm performance, and novel breast cancer based diagnostic, prognostic and treatment prediction algorithms, respectively, which take into account the existing issues in order to provide a fair estimation of their performance
    corecore