7 research outputs found

    Graph ranking for exploratory gene data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.</p> <p>Results</p> <p>We propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.</p> <p>Conclusion</p> <p>The gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.</p

    PIMKL: Pathway Induced Multiple Kernel Learning

    Full text link
    Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks

    Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge

    Get PDF
    Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism

    Βιοπληροφορική ανάλυση μεταγραφικών δεδομένων μικροσυστοιχιών για τη μελέτη μοριακών μηχανισμών γήρανσης και τη διερεύνηση μηχανισμών εξέλιξης της ασθένειας του Alzheimer στο μοντέλο-οργανισμό Caenorhabditis elegans

    Get PDF
    Η γήρανση είναι μια φυσιολογική διαδικασία που ορίζεται ως η σταδιακή κατάρρευση της οργανισμικής ομοιόστασης και των φυσιολογικών λειτουργιών. Η εξέλιξή της επηρεάζεται από γενετικούς και περιβαλλοντικούς παράγοντες, με υψηλή συντήρηση των ρυθμιστικών της μηχανισμών σε διαφορετικά είδη. Στον άνθρωπο, με την εξέλιξη της γήρανσης εμφανίζονται συχνά ηλικιο-εξαρτώμενες ασθένειες, όπως η νόσος του Alzheimer. Ως μοντέλο-οργανισμός, ο νηματώδης σκώληκας C. elegans έχει πλήθος πλεονεκτημάτων με κυριότερα το μικρό κύκλο ζωής και την υψηλή ομολογία του γονιδιώματός του σε σχέση με το ανθρώπινο. Στην παρούσα εργασία πραγματοποιήθηκε συγκριτική μεταανάλυση συνόλων δεδομένων μικροσυστοιχιών, που έχουν προκύψει από πειράματα επίδρασης φυσικών ουσιών στην επέκταση του προσδόκιμου ζωής στον C. elegans. Αφετηρία σύγκρισης αποτέλεσε ένα σύνολο μεταγραφικών δεδομένων που αφορά στην εξέλιξη της γήρανσης. Αντίστοιχα, μελετήθηκε η επίδραση φυσικών ουσιών στην καθυστέρηση του φαινοτύπου παράλυσης, με αφετηρία σύγκρισης σύνολα δεδομένων από διαγονιδιακούς σκώληκες μοντέλα για τη νόσο του Alzheimer, που εκδηλώνουν φαινότυπο παράλυσης. Σημαντικά για την εξέλιξη των φαινομένων αναδείχθηκαν μονοπάτια που σχετίζονται με την έμφυτη ανοσία, την απόκριση σε βλάβες στο DNA, την πρωτεόσταση, τον μεταβολισμό, αντιοξειδωτικούς μηχανισμούς και μετά-μεταφραστικές τροποποιήσεις. Απώτερος στόχος είναι η αποκάλυψη των προτύπων γονιδιακής έκφρασης που επάγονται παρουσία πιθανών αντι-γηραντικών ή θεραπευτικών φυσικών ουσιών.Ageing is defined as a gradual decline of organismal homeostasis and of physiologic functions throughout the body, and is associated with an increased risk of age-related disease Lifespan in metazoans is influenced by genetic and environmental factors. Lifespan-control mechanisms are remarkably conserved across species. The nematode C. elegans is a powerful model system for ageing, because of its genetics, its relatively short lifespan, and its high homology with human genome. We performed a comparative bioinformatics meta-analysis on transcriptomics data from microarrays experiments related to natural ageing progression, as well as to ageing following treatment with natural compounds that eventually lead to lifespan extension. In addition, data referring to paralysis phenotype simulating the progression of Alzheimer’s disease in transgenic C. elegans strains and its delay due to the effect of natural compounds was also studied. We report that important pathways for these phenomena are related to innate immunity, DNA damage response, metabolism, proteostasis, antioxidant mechanisms and post-translational modifications as well. Our ultimate goal is to reveal patterns of gene expression that are induced in the presence of potential anti-ageing or therapeutic natural substances
    corecore