22 research outputs found

    A systems biology approach to investigate the response of Synechocystis sp. PCC6803 to a high salt environment.

    Get PDF
    BACKGROUND: Salt overloading during agricultural processes is causing a decrease in crop productivity due to saline sensitivity. Salt tolerant cyanobacteria share many cellular characteristics with higher plants and therefore make ideal model systems for studying salinity stress. Here, the response of fully adapted Synechocystis sp. PCC6803 cells to the addition of 6% w/v NaCl was investigated using proteomics combined with targeted analysis of transcripts. RESULTS: Isobaric mass tagging of peptides led to accurate relative quantitation and identification of 378 proteins, and approximately 40% of these were differentially expressed after incubation in BG-11 media supplemented with 6% salt for 9 days. Protein abundance changes were related to essential cellular functional alterations. Differentially expressed proteins involved in metabolic responses were also analysed using the probabilitistic tool Mixed Model on Graphs (MMG), where the role of energy conversion through glycolysis and reducing power through pentose phosphate pathway were highlighted. Temporal RT-qPCR experiments were also run to investigate protein expression changes at the transcript level, for 14 non-metabolic proteins. In 9 out of 14 cases the mRNA changes were in accordance with the proteins. CONCLUSION: Synechocystis sp. PCC6803 has the ability to regulate essential metabolic processes to enable survival in high salt environments. This adaptation strategy is assisted by further regulation of proteins involved in non-metabolic cellular processes, supported by transcriptional and post-transcriptional control. This study demonstrates the effectiveness of using a systems biology approach in answering environmental, and in particular, salt adaptation questions in Synechocystis sp. PCC6803

    Global modeling of transcriptional responses in interaction networks

    Full text link
    Motivation: Cell-biological processes are regulated through a complex network of interactions between genes and their products. The processes, their activating conditions, and the associated transcriptional responses are often unknown. Organism-wide modeling of network activation can reveal unique and shared mechanisms between physiological conditions, and potentially as yet unknown processes. We introduce a novel approach for organism-wide discovery and analysis of transcriptional responses in interaction networks. The method searches for local, connected regions in a network that exhibit coordinated transcriptional response in a subset of conditions. Known interactions between genes are used to limit the search space and to guide the analysis. Validation on a human pathway network reveals physiologically coherent responses, functional relatedness between physiological conditions, and coordinated, context-specific regulation of the genes. Availability: Implementation is freely available in R and Matlab at http://netpro.r-forge.r-project.orgComment: 19 pages, 13 figure

    DETECTING CANCER-RELATED GENES AND GENE-GENE INTERACTIONS BY MACHINE LEARNING METHODS

    Get PDF
    To understand the underlying molecular mechanisms of cancer and therefore to improve pathogenesis, prevention, diagnosis and treatment of cancer, it is necessary to explore the activities of cancer-related genes and the interactions among these genes. In this dissertation, I use machine learning and computational methods to identify differential gene relations and detect gene-gene interactions. To identify gene pairs that have different relationships in normal versus cancer tissues, I develop an integrative method based on the bootstrapping K-S test to evaluate a large number of microarray datasets. The experimental results demonstrate that my method can find meaningful alterations in gene relations. For gene-gene interaction detection, I propose to use two Bayesian Network based methods: DASSO-MB (Detection of ASSOciations using Markov Blanket) and EpiBN (Epistatic interaction detection using Bayesian Network model) to address the two critical challenges: searching and scoring. DASSO-MB is based on the concept of Markov Blanket in Bayesian Networks. In EpiBN, I develop a new scoring function, which can reflect higher-order gene-gene interactions and detect the true number of disease markers, and apply a fast Branch-and-Bound (B&B) algorithm to learn the structure of Bayesian Network. Both DASSO-MB and EpiBN outperform some other commonly-used methods and are scalable to genome-wide data

    Neue bioinformatische und statistische Methoden für die Analyse von Massenspektrometrie-basierten phosphoproteomischen Daten

    Get PDF
    In living cells, reversible protein phosphorylation events propagate signals caused by external stimuli from the plasma membrane to their intracellular destinations. Aberrations in these signaling cascades can lead to diseases such as cancer. To identify and quantify phosphorylation events on a large scale, mass spectrometry (MS) has become the predominant technology. The large amount of data generated by MS requires efficient, tailor-made computational tools in order to draw meaningful biological conclusions. In this work, four new methods for analyzing MS-based phosphoproteomic data are presented. The first method, called SubExtractor, combines phosphoproteomic data with protein network information to identify differentially regulated subnetworks. The method is based on a Bayesian probabilistic model that accounts for information about both differential regulation and network topology, combined with a genetic algorithm and rigorous significance testing. The second method, called MeanRank test, is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. The test successfully deals with small numbers of replicates, missing values without the need of imputation, non-normally distributed expression levels, and non-identical distribution of up- and down-regulated features, while its statistical power scales well with the number of replicates. The third method is a biomarker discovery workflow that aims at identifying a multivariate response prediction biomarker for treatment of non-small cell lung cancer cell lines with the kinase inhibitor dasatinib from phosphoproteomic data (referred to as NSCLC biomarker). An elaborate biomarker workflow based on robust feature selection in combination with a support vector machine (SVM) was designed in order to find a phosphorylation signature that accurately predicts the response to dasatanib. The fourth method, called Pareto biomarker, extends the previous NSCLC biomarker workflow by optimizing not only one single objective (i.e. best possible separation of responders and non-responders), but also the objectives signature size and relevance (i.e. association of signature proteins with dasatinib’s main target). This is achieved by employing a multiobjective optimization algorithm based on the principle of Pareto optimality, which allows for a simultaneous optimization of all three objectives. These novel data analysis methods were thoroughly validated using experimental data and compared to existing methods. They can be used on their own, or they can be combined into a joint workflow in order to efficiently answer complex biological questions in the field of large-scale omics in general and phosphoproteomics in particular.In lebenden Zellen sind reversible Proteinphosphorylierungen für die Weiterleitung von Signalen externer Stimuli zu deren intrazellulären Bestimmungsorten verantwortlich. Anomalien in solchen Signaltransduktionswegen können zu Krankheiten wie beispielsweise Krebs führen. Um Phosphorylierungsstellen in großem Maßstab zu identifizieren und zu quantifizieren, hat sich die Massenspektrometrie (MS) zur vorherrschenden Technologie entwickelt. Die große Menge an Daten, die von Massenspektrometern generiert wird, erfordert effiziente maßgeschneiderte Computerprogramme, um aussagekräftige biologische Schlüsse ziehen zu können. In dieser Arbeit werden vier neue Methoden zur Analyse von MS-basierten phosphoproteomischen Daten präsentiert. Die erste Methode, genannt SubExtractor, kombiniert phosphoproteomische Daten mit Proteinnetzwerkinformationen um differentiell regulierte Subnetzwerke zu identifizieren. Die Methode basiert auf einem Bayesschen Wahrscheinlichkeitsmodell, das sowohl Information über die differentielle Regulation der Einzelknoten als auch die Netzwerktopologie berücksichtigt. Das Modell ist kombiniert mit einem genetischen Algorithmus und stringenter Signifikanzanalyse. Die zweite Methode, genannt MeanRank-Test, ist ein globaler Einstichproben-Lagetest, der auf den mittleren Rängen der Replikate beruht, und die False Discovery Rate implizit abschätzt und kontrolliert. Der Test eignet sich für die Anwendung auf Daten mit wenigen Replikate, fehlenden und nicht normalverteilten Werten, sowie nicht gleichverteilter Hoch- und Runterregulation. Gleichzeitig skaliert die Teststärke gut mit der Anzahl an Replikaten. Die dritte Methode ist ein Arbeitsablauf zur Biomarkeridentifizierung und hat zum Ziel, einen multivariaten Stratifikationsbiomarker aus phosphoproteomischen Daten zu extrahieren, der das Ansprechen von nichtkleinzelligen Bronchialkarzinomzelllinien auf den Kinaseinhibitor Dasatinib vorhersagt (bezeichnet als NSCLC-Biomarker). Dazu wurde ein ausführlicher Biomarkerarbeitsablauf basierend auf einer robusten Feature Selection in Kombination mit Support Vector Machine-Klassifizierung erstellt, um eine Phosphorylierungssignatur zu finden, die das Ansprechen auf Dasatinib richtig vorhersagt. Die vierte Methode, genannt Pareto-Biomarker, erweitert den vorherigen Biomarkerarbeitsablauf, indem nicht nur eine Zielfunktion (d.h. die bestmögliche Trennung von Respondern und Nichtrespondern) optimiert wird, sondern zusätzlich noch die Signaturgröße und Relevanz (d.h. die Verbindung der Signaturproteine mit dem Targetprotein von Dasatinib). Dies wird durch die Verwendung eines multiobjektiven Optimierungsalgorithmus erreicht, der auf dem Prinzip der Pareto-Optimalität beruht und die gleichzeitige Optimierung aller drei Zielfunktionen ermöglicht. Die hier präsentierten neuen Datenanalysemethoden wurden gründlich mittels experimenteller Daten validiert und mit bereits bestehenden Methoden verglichen. Sie können einzeln verwendet werden, oder man kann sie zu einem gemeinsamen Arbeitsablauf zusammenfügen, um komplexe biologische Fragestellungen in Omik-Gebieten im Allgemeinen und Phosphoproteomik im Speziellen zu beantworten

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    Symbiont-host interactome mapping reveals effector-targeted modulation of hormone networks and activation of growth promotion

    Get PDF
    Plants have benefited from interactions with symbionts for coping with challenging environments since the colonisation of land. The mechanisms of symbiont-mediated beneficial effects and similarities and differences to pathogen strategies are mostly unknown. Here, we use 106 (effector-) proteins, secreted by the symbiont Serendipita indica (Si) to modulate host physiology, to map interactions with Arabidopsis thaliana host proteins. Using integrative network analysis, we show significant convergence on target-proteins shared with pathogens and exclusive targeting of Arabidopsis proteins in the phytohormone signalling network. Functional in planta screening and phenotyping of Si effectors and interacting proteins reveals previously unknown hormone functions of Arabidopsis proteins and direct beneficial activities mediated by effectors in Arabidopsis. Thus, symbionts and pathogens target a shared molecular microbe-host interface. At the same time Si effectors specifically target the plant hormone network and constitute a powerful resource for elucidating the signalling network function and boosting plant productivity

    Multi-species integrative biclustering

    Get PDF
    We describe an algorithm, multi-species cMonkey, for the simultaneous biclustering of heterogeneous multiple-species data collections and apply the algorithm to a group of bacteria containing Bacillus subtilis, Bacillus anthracis, and Listeria monocytogenes. The algorithm reveals evolutionary insights into the surprisingly high degree of conservation of regulatory modules across these three species and allows data and insights from well-studied organisms to complement the analysis of related but less well studied organisms

    A Gibbs sampling strategy for mining of protein-protein interaction networks and protein structures

    Get PDF
    Complex networks are general and can be used to model phenomena that belongs to different fields of research, from biochemical applications to social networks. However, due to the intrinsic complexity of real networks, their analysis can be computationally demanding. Recently, several statistic and probabilistic analysis approaches have been designed, resulting to be much faster, flexible and effective than deterministic algorithms. Among statistical methods, Gibbs sampling is one of the simplest and most powerful algorithms for solving complex optimization problems and it has been applied in different contexts. It has shown its effectiveness in computational biology but in sequence analysis rather than in network analysis. One approach to analyze complex networks is to compare them, in order to identify similar patterns of interconnections and predict the function or the role of some unknown nodes. Thus, this motivated the main goal of the thesis: designing and implementing novel graph mining techniques based on Gibbs sampling to compare two or more complex networks. The methodology is domain-independent and can work on any complex system of interacting entities with associated attributes. However, in this thesis we focus our attention on protein analysis overcoming the strong current limitations in this area. Proteins can be analyzed from two different points of view: (i) an internal perspective, i.e. the 3D structure of the protein, (ii) an external perspective, i.e. the interactions with other macromolecules. In both cases, a comparative analysis with other proteins of the same or distinct species can reveal important clues for the function of the protein and evolutionary convergences or divergences between different organisms in the way a specific function or process is carried out. First, we present two methods based on Gibbs sampling for the comparative analysis of protein-protein interaction networks: GASOLINE and SPECTRA. GASOLINE is a stochastic and greedy algorithm to find similar groups of interacting proteins in two or more networks. It can align many networks and more quickly than the state-of-the-art methods. SPECTRA is a framework to retrieve and compare networks of proteins that interact with one another in specific healthy or tumor tissues. The aim in this case is to identify changes in protein concentration or protein "behaviour" across different tissues. SPECTRA is an adaptation of GASOLINE for weighted protein-protein interaction networks with gene expressions as node weights. It is the first algorithm proposed for multiple comparison of tissue-specific interaction networks. We also describe a Gibbs sampling based algorithm for 3D protein structure comparison, called PROPOSAL, which finds local structural similarities across two or more protein structures. Experimental results confirm our computational predictions and show that the proposed algorithms are much faster and in most cases more accurate than existing methods
    corecore