7 research outputs found
Graph ranking for exploratory gene data analysis
<p>Abstract</p> <p>Background</p> <p>Microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.</p> <p>Results</p> <p>We propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.</p> <p>Conclusion</p> <p>The gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.</p
PIMKL: Pathway Induced Multiple Kernel Learning
Reliable identification of molecular biomarkers is essential for accurate
patient stratification. While state-of-the-art machine learning approaches for
sample classification continue to push boundaries in terms of performance, most
of these methods are not able to integrate different data types and lack
generalization power, limiting their application in a clinical setting.
Furthermore, many methods behave as black boxes, and we have very little
understanding about the mechanisms that lead to the prediction. While
opaqueness concerning machine behaviour might not be a problem in deterministic
domains, in health care, providing explanations about the molecular factors and
phenotypes that are driving the classification is crucial to build trust in the
performance of the predictive system. We propose Pathway Induced Multiple
Kernel Learning (PIMKL), a novel methodology to reliably classify samples that
can also help gain insights into the molecular mechanisms that underlie the
classification. PIMKL exploits prior knowledge in the form of a molecular
interaction network and annotated gene sets, by optimizing a mixture of
pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an
approach that has demonstrated excellent performance in different machine
learning applications. After optimizing the combination of kernels for
prediction of a specific phenotype, the model provides a stable molecular
signature that can be interpreted in the light of the ingested prior knowledge
and that can be used in transfer learning tasks
Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge
Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism
Βιοπληροφορική ανάλυση μεταγραφικών δεδομένων μικροσυστοιχιών για τη μελέτη μοριακών μηχανισμών γήρανσης και τη διερεύνηση μηχανισμών εξέλιξης της ασθένειας του Alzheimer στο μοντέλο-οργανισμό Caenorhabditis elegans
Η γήρανση είναι μια φυσιολογική διαδικασία που ορίζεται ως η σταδιακή
κατάρρευση της οργανισμικής ομοιόστασης και των φυσιολογικών λειτουργιών. Η
εξέλιξή της επηρεάζεται από γενετικούς και περιβαλλοντικούς παράγοντες, με
υψηλή συντήρηση των ρυθμιστικών της μηχανισμών σε διαφορετικά είδη. Στον
άνθρωπο, με την εξέλιξη της γήρανσης εμφανίζονται συχνά ηλικιο-εξαρτώμενες
ασθένειες, όπως η νόσος του Alzheimer. Ως μοντέλο-οργανισμός, ο νηματώδης
σκώληκας C. elegans έχει πλήθος πλεονεκτημάτων με κυριότερα το μικρό κύκλο ζωής
και την υψηλή ομολογία του γονιδιώματός του σε σχέση με το ανθρώπινο. Στην
παρούσα εργασία πραγματοποιήθηκε συγκριτική μεταανάλυση συνόλων δεδομένων
μικροσυστοιχιών, που έχουν προκύψει από πειράματα επίδρασης φυσικών ουσιών στην
επέκταση του προσδόκιμου ζωής στον C. elegans. Αφετηρία σύγκρισης αποτέλεσε ένα
σύνολο μεταγραφικών δεδομένων που αφορά στην εξέλιξη της γήρανσης. Αντίστοιχα,
μελετήθηκε η επίδραση φυσικών ουσιών στην καθυστέρηση του φαινοτύπου παράλυσης,
με αφετηρία σύγκρισης σύνολα δεδομένων από διαγονιδιακούς σκώληκες μοντέλα για
τη νόσο του Alzheimer, που εκδηλώνουν φαινότυπο παράλυσης. Σημαντικά για την
εξέλιξη των φαινομένων αναδείχθηκαν μονοπάτια που σχετίζονται με την έμφυτη
ανοσία, την απόκριση σε βλάβες στο DNA, την πρωτεόσταση, τον μεταβολισμό,
αντιοξειδωτικούς μηχανισμούς και μετά-μεταφραστικές τροποποιήσεις. Απώτερος
στόχος είναι η αποκάλυψη των προτύπων γονιδιακής έκφρασης που επάγονται
παρουσία πιθανών αντι-γηραντικών ή θεραπευτικών φυσικών ουσιών.Ageing is defined as a gradual decline of organismal homeostasis and of
physiologic functions throughout the body, and is associated with an increased
risk of age-related disease Lifespan in metazoans is influenced by genetic and
environmental factors. Lifespan-control mechanisms are remarkably conserved
across species. The nematode C. elegans is a powerful model system for ageing,
because of its genetics, its relatively short lifespan, and its high homology
with human genome. We performed a comparative bioinformatics meta-analysis on
transcriptomics data from microarrays experiments related to natural ageing
progression, as well as to ageing following treatment with natural compounds
that eventually lead to lifespan extension. In addition, data referring to
paralysis phenotype simulating the progression of Alzheimer’s disease in
transgenic C. elegans strains and its delay due to the effect of natural
compounds was also studied. We report that important pathways for these
phenomena are related to innate immunity, DNA damage response, metabolism,
proteostasis, antioxidant mechanisms and post-translational modifications as
well. Our ultimate goal is to reveal patterns of gene expression that are
induced in the presence of potential anti-ageing or therapeutic natural
substances