10 research outputs found

    Protein Networks as Logic Functions in Development and Cancer

    Get PDF
    Many biological and clinical outcomes are based not on single proteins, but on modules of proteins embedded in protein networks. A fundamental question is how the proteins within each module contribute to the overall module activity. Here, we study the modules underlying three representative biological programs related to tissue development, breast cancer metastasis, or progression of brain cancer, respectively. For each case we apply a new method, called Network-Guided Forests, to identify predictive modules together with logic functions which tie the activity of each module to the activity of its component genes. The resulting modules implement a diverse repertoire of decision logic which cannot be captured using the simple approximations suggested in previous work such as gene summation or subtraction. We show that in cancer, certain combinations of oncogenes and tumor suppressors exert competing forces on the system, suggesting that medical genetics should move beyond cataloguing individual cancer genes to cataloguing their combinatorial logic

    Inferring the functions of longevity genes with modular subnetwork biomarkers of Caenorhabditis elegans aging

    Get PDF
    An algorithm for determining networks from gene expression data enables the identification of genes potentially linked to aging in worms

    Elucidation of time-dependent systems biology cell response patterns with time course network enrichment

    Get PDF
    Advances in OMICS technologies emerged both massive expression data sets and huge networks modelling the molecular interplay of genes, RNAs, proteins and metabolites. Network enrichment methods combine these two data types to extract subnetwork responses from case/control setups. However, no methods exist to integrate time series data with networks, thus preventing the identification of time-dependent systems biology responses. We close this gap with Time Course Network Enrichment (TiCoNE). It combines a new kind of human-augmented clustering with a novel approach to network enrichment. It finds temporal expression prototypes that are mapped to a network and investigated for enriched prototype pairs interacting more often than expected by chance. Such patterns of temporal subnetwork co-enrichment can be compared between different conditions. With TiCoNE, we identified the first distinguishing temporal systems biology profiles in time series gene expression data of human lung cells after infection with Influenza and Rhino virus. TiCoNE is available online (https://ticone.compbio.sdu.dk) and as Cytoscape app in the Cytoscape App Store (http://apps.cytoscape.org/)

    Identifying Causal Genes and Dysregulated Pathways in Complex Diseases

    Get PDF
    In complex diseases, various combinations of genomic perturbations often lead to the same phenotype. On a molecular level, combinations of genomic perturbations are assumed to dys-regulate the same cellular pathways. Such a pathway-centric perspective is fundamental to understanding the mechanisms of complex diseases and the identification of potential drug targets. In order to provide an integrated perspective on complex disease mechanisms, we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. First, we identified a representative set of genes that are differentially expressed in cancer compared to non-tumor control cases. Assuming that disease-associated gene expression changes are caused by genomic alterations, we determined potential paths from such genomic causes to target genes through a network of molecular interactions. Applying our method to sets of genomic alterations and gene expression profiles of 158 Glioblastoma multiforme (GBM) patients we uncovered candidate causal genes and causal paths that are potentially responsible for the altered expression of disease genes. We discovered a set of putative causal genes that potentially play a role in the disease. Combining an expression Quantitative Trait Loci (eQTL) analysis with pathway information, our approach allowed us not only to identify potential causal genes but also to find intermediate nodes and pathways mediating the information flow between causal and target genes. Our results indicate that different genomic perturbations indeed dys-regulate the same functional pathways, supporting a pathway-centric perspective of cancer. While copy number alterations and gene expression data of glioblastoma patients provided opportunities to test our approach, our method can be applied to any disease system where genetic variations play a fundamental causal role

    Optimisation approaches for data mining in biological systems

    Get PDF
    The advances in data acquisition technologies have generated massive amounts of data that present considerable challenge for analysis. How to efficiently and automatically mine through the data and extract the maximum value by identifying the hidden patterns is an active research area, called data mining. This thesis tackles several problems in data mining, including data classification, regression analysis and community detection in complex networks, with considerable applications in various biological systems. First, the problem of data classification is investigated. An existing classifier has been adopted from literature and two novel solution procedures have been proposed, which are shown to improve the predictive accuracy of the original method and significantly reduce the computational time. Disease classification using high throughput genomic data is also addressed. To tackle the problem of analysing large number of genes against small number of samples, a new approach of incorporating extra biological knowledge and constructing higher level composite features for classification has been proposed. A novel model has been introduced to optimise the construction of composite features. Subsequently, regression analysis is considered where two piece-wise linear regression methods have been presented. The first method partitions one feature into multiple complementary intervals and ts each with a distinct linear function. The other method is a more generalised variant of the previous one and performs recursive binary partitioning that permits partitioning of multiple features. Lastly, community detection in complex networks is investigated where a new optimisation framework is introduced to identify the modular structure hidden in directed networks via optimisation of modularity. A non-linear model is firstly proposed before its linearised variant is presented. The optimisation framework consists of two major steps, including solving the non-linear model to identify a coarse initial partition and a second step of solving repeatedly the linearised models to re fine the network partition

    Bioinformatics approaches for cancer research

    Get PDF
    Cancer is the consequence of genetic alterations that influence the behavior of affected cells. While the phenotypic effects of cancer like infinite proliferation are common hallmarks of this complex class of diseases, the connections between the genetic alterations and these effects are not always evident. The growth of information generated by experimental high-throughput techniques makes it possible to combine heterogeneous data from different sources to gain new insights into these complex molecular processes. The demand on computational biology to develop tools and methods to facilitate the evaluation of such data has increased accordingly. To this end, we developed new approaches and bioinformatics tools for the analysis of high-throughput data. Additionally, we integrated these new approaches into our comprehensive C++ framework GeneTrail. GeneTrail presents a powerful package that combines information retrieval, statistical evaluation of gene sets, result presentation, and data exchange. To make GeneTrail';s capabilities available to the research community, we implemented a graphical user interface in PHP and set up a webserver that is world-wide accessible. In this thesis, we discuss newly integrated algorithms and extensions of GeneTrail, as well as some comprehensive studies that have been performed with GeneTrail in the context of cancer research. We applied GeneTrail to analyze properties of tumor-associated antigens to elucidate the mechanisms of antigen candidate selection. Furthermore, we performed an extensive analysis of miRNAs and their putative target pathways and networks in cancer. In the field of differential network analysis, we employed a combination of expression values and topological data to identify patterns of deregulated subnetworks and putative key players for the deregulation. Signatures of deregulated subnetworks may help to predict the sensitivity of tumor subtypes to therapeutic agents and, hence, may be used in the future to guide the selection of optimal agents. Furthermore, the identified putative key players may represent oncogenes, tumor suppressor genes, or other genes that contribute to crucial changes of regulatory and signaling processes in cancer cells and may serve as potential targets for an individualized tumor therapy. With these applications, we demonstrate the usefulness of our GeneTrail package and hope that our work will contribute to a better understanding of cancer.Krebs ist eine Folge von tiefgreifenden genetischen Veränderungen, die das Verhalten der betroffenen Zellen beeinflussen. Während phänotypische Effekte wie unaufhörliches Wachstum augenscheinliche Merkmale dieser komplexen Klasse von Krankheiten sind, sind die Zusammenhänge zwischen genetischen Veränderungen und diesen Effekten oftmals weit weniger offensichtlich. Mit der stetigen Zunahme an Daten, die aus Hochdurchsatz-Verfahren stammen, ist es möglich geworden, heterogene Daten aus verschiedenen Quellen zu kombinieren und neue Erkenntnisse über diese Zusammenhänge zu gewinnen. Dementsprechend sind auch die Anforderungen an die Bioinformatik gewachsen, geeignete Applikationen und Verfahren zu entwickeln, um die Auswertung solcher Daten zu vereinfachen. Zu diesem Zweck haben wir neue Ansätze und bioinformatische Werkzeuge für die Analyse von entsprechenden Daten für die Krebsforschung entwickelt, welche wir in unser umfangreiches C++ System GeneTrail integriert haben. GeneTrail stellt ein mächtiges Softwarepaket dar, das Informationsgewinnung, statistische Auswertung von Gen Mengen, visuelle Darstellung der Resultate und Datenaustausch kombiniert. Um GeneTrail';s Fähigkeiten der Forschungsgemeinschaft zugänglich zu machen, haben wir eine graphische Benutzerschnittstelle in PHP implementiert und einen Webserver aufgesetzt, auf den weltweit zugegriffen werden kann. In der vorliegenden Arbeit diskutieren wir neu integrierte Algorithmen und Erweiterungen von GeneTrail, sowie umfangreiche Untersuchungen im Bereich Krebsforschung, die mit GeneTrail durchgeführt wurden. Wir haben GeneTrail angewendet, um Eigenschaften von Tumorantigenen zu untersuchen, um aufzuklären, welche dieser Eigenschaften zur Selektion dieser Proteine als Antigene beitragen. Des Weiteren haben wir eine umfangreiche Analyse von miRNAs und deren potentiellen Zielpfaden und -netzen in verschiedenen Krebsarten durchgeführt. Im Bereich differentieller Netzwerkanalyse kombinierten wir Expressionswerte und topologische Netzwerkdaten, um Muster deregulierter Teilnetzwerke und mögliche Schlüsselgene für die Deregulation zu identifizieren. Signaturen deregulierter Teilnetzwerke können helfen die Sensitivität verschiedener Tumorarten gegenüber Therapeutika vorherzusagen und damit zukünftig eine optimal angepasste Therapie zu ermöglichen. Außerdem können die identifizierten potentiellen Schlüsselgene Oncogene, Tumorsuppressorgene, oder andere Gene darstellen, die zu wichtigen Änderungen von regulatorischen Prozessen in Krebszellen beitragen, und damit auch als potentielle Ziele für eine individuelle Tumortherapie in Frage kommen. Mit diesen Anwendungen untermauern wir den Nutzen von GeneTrail und hoffen, dass unsere Arbeit in Zukunft zu einem besseren Verständnis von Krebs beiträgt

    Optimization in bioinformatics

    Get PDF
    In this work, we present novel optimization approaches for important bioinformatical problems. The rst part deals mainly with the local optimization of molecular structures and its applications to molecular docking, while the second part discusses discrete global optimization. In the rst part, we present a novel algorithm to an old task: nd the next local optimum into a given direction on a molecular potential energy function (line search). We show that replacing a standard line search method with the new algorithm reduced the number of function/gradient evaluations in our test runs down to 47.7% (down to 85% on average) . Then, we include this method into our novel approach for locally optimizing exible ligands in the presence of their receptors, which we describe in detail, avoiding the singularity problem of orientational parameters. We extend this approach to a full ligand-receptor docking program using a Lamarckian genetic algorithm. Our validation runs show that we gained an up to tenfold speedup in comparison to other tested methods. Then, we further incorporate side chain exibility of the receptor into our approach and introduce limited backbone exibility by interpolating between known extremal conformations using spherical linear extrapolation. Our results show that this approach is very promising for exible ligand-receptor docking. However, the drawback is that we need known extremal backbone conformations for the interpolation. In the last section of the rst part, we allow a loop region to be fully exible. We present a new method to nd all possible conformations using the Go-Scheraga ring closure equations and interval arithmetic. Our results show that this algorithm reliably nds alternative conformations and is able to identify promising loop/ligand complexes of the studied example. In the second part of this work, we describe the bond order assignment problem for molecular structures. We present our novel linear 0-1-programming formulation for the very efficient computation of all optimal and suboptimal bond order assignments and show that our approach does not only outperform the original heuristic approach of Wang et al. but also commonly used software for determining bond orders on our test set considering all optimal results. This test set consists of 761 thoroughly prepared drug like molecules that were originally used for the validation of the Merck Molecular Force Field. Then, we present our lter method for feature subset selection that is based on mutual information and uses second order information. We show our mathematically well motivated criterion and, in contrast to other methods, solve the resulting optimization problem exactly by quadratic 0-1-programming. In the validation runs, our method could achieve in 18 out of 21 test scenarios the best classification accuracies. In the last section, we give our integer linear programming formulation for the detection of deregulated subgraphs in regulatory networks using expression proles. Our approach identies the subnetwork of a certain size of the regulatory network with the highest sum of node scores. To demonstrate the capabilities of our algorithm, we analyzed expression proles from nonmalignant primary mammary epithelial cells derived from BRCA1 mutation carriers and epithelial cells without BRCA1 mutation. Our results suggest that oxidative stress plays an important role in epithelial cells with BRCA1 mutations that may contribute to the later development of breast cancer. The application of our algorithm to already published data can yield new insights. As expression data and network data are still growing, methods as our algorithm will be valuable to detect deregulated subgraphs in different conditions and help contribute to a better understanding of diseases.In der vorliegenden Arbeit präsentieren wir neue Optimierungsansätze für wichtige Probleme der Bioinformatik. Der erste Teil behandelt vorwiegend die lokale Optimierung von Molekülen und die Anwendung beim molekularen Docking. Der zweite Teil diskutiert diskrete globale Optimierung. Im ersten Teil präsentieren wir einen neuartigen Algorithmus für ein altes Problem: finde das nächste lokale Optimum in einer gegebenen Richtung auf einer Energiefunktion (Liniensuche, "line search"). Wir zeigen, dass die Ersetzung einer Standardliniensuche mit unserer neuen Methode die Anzahl der Funktions- und Gradientauswertungen in unseren Testläufen auf bis zu 47.7% reduzierte (85% im Mittel). Danach nehmen wir diese Methode in unseren neuen Ansatz zur lokalen Optimierung von flexiblen Liganden im Beisein ihres Rezeptors auf, den wir im Detail beschreiben. Unser Verfahren vermeidet das Singularitätsproblem von Orientierungsparametern. Wir erweitern diese Methode zu einem vollständigen Liganden-Rezeptor-Dockingprogramm, indem wir einen Lamarck'schen genetischen Algorithmus einsetzen. Unsere Validierungsläufe zeigen, dass wir im Vergleich zu anderen getesteten Methoden einen bis zu zehnfachen Geschwindigkeitszuwachs erreichen. Danach arbeiten wir in unseren Ansatz Seitenketten- und begrenzte Backbone exibilität ein, indem wir zwischen bekannten Extremkonformationen mittels sphärischer linearer Extrapolation interpolieren. Unsere Resultate zeigen, dass unsere Methode sehr viel versprechend für flexibles Liganden-Rezeptor-Docking ist. Dennoch hat dieser Ansatz den Nachteil, dass man bekannte Extremkonformationen des Backbones für die Interpolation benötigt. Im letzten Abschnitt des ersten Teils behandeln wir eine Loopregion voll flexibel. Wir zeigen eine neue Methode, die die Go-Scheraga Ringschlussgleichungen und Intervalarithmetik nutzt, um alle möglichen Konformationen zu nden. Unsere Resultate zeigen, dass dieser Algorithmus zuverlässig in der Lage ist, alternative Konformationen zu nden. Er identiziert sehr vielversprechende Loop-Ligandenkomplexe unseres Testbeispiels. Im zweiten Teil dieser Arbeit beschreiben wir das Bindungsordnungszuweisungsproblem von Molekülen. Wir präsentieren unsere neuartige Formulierung, die auf linearer 0-1-Programmierung basiert. Dieser Ansatz ist in der Lage sehr effizient alle optimalen und suboptimalen Bindngsordnungszuweisungen zu berechnen. Unsere Methode ist nicht nur besser als der ursprüngliche Ansatz von Wang et al., sondern auch weitverbreiteter Software zur Bindungszuordnung auf unserem Testdatensatz überlegen. Dieser Datensatz besteht aus 761 sorgfältig präparierten, arzneimittelähnlichen Molekülen, die ursprünglich zur Validierung des Merck-Kraftfeldes eingesetzt wurden. Danach präsentieren wir unsere Filtermethode zur "Feature Subset Selection", die auf "Mutual Information" basiert und Informationen zweiter Ordnung nutzt. Wir geben unser mathematisch motiviertes Kriterium an und lösen das resultierende Optimierungsproblem global optimal im Gegensatz zu anderen Ansätzen. In unseren Validierungsläufen konnte unsere Methode in 18 von 21 Testszenarien die beste Klassizierungsrate erreichen. Im letzten Abschnitt geben wir unsere, auf linearer 0-1-Programmierung basierende Formulierung zur Berechnung von deregulierten Untergraphen in regulatorischen Netzwerken an. Die Basisdaten für diese Methode sind Expressionsprole. Unser Ansatz identiziert die Unternetze einer gewissen Größe mit der höchsten Summe der Knotenscores. Wir analysierten Expressionsprole von nicht bösartigen Brustepithelzellen von BRCA1 Mutationsträgern und Epithelzellen ohne BRCA1 Mutation, um die Fähigkeiten unseres Algorithmuses zu demonstrieren. Unsere Resultate legen nahe, dass oxidativer Stress eine wichtige Rolle bei Epithelzellen mit BRCA1 Mutation spielt, der zur späteren Entwicklung von Brustkrebs beitragen könnte. Die Anwendung unseres Ansatzes auf bereits publizierte Daten kann zu neuen Erkenntnissen führen. Da sowohl Expressions- wie auch Netzwerkdaten ständig anwachsen, sind es Methoden wie unser Algorithmus die wertvoll sein werden, um deregulierte Subgraphen in verschiedenen Situationen zu entdecken. Damit trägt unser Ansatz zu einem besseren Verständnis von Krankheiten und deren Verlauf bei
    corecore