1,659 research outputs found

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    A poxvirus Bcl-2-like gene family involved in regulation of host immune response: sequence similarity and evolutionary history

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Poxviruses evade the immune system of the host through the action of viral encoded inhibitors that block various signalling pathways. The exact number of viral inhibitors is not yet known. Several members of the vaccinia virus A46 and N1 families, with a Bcl-2-like structure, are involved in the regulation of the host innate immune response where they act non-redundantly at different levels of the Toll-like receptor signalling pathway. N1 also maintains an anti-apoptotic effect by acting similarly to cellular Bcl-2 proteins. Whether there are related families that could have similar functions is the main subject of this investigation.</p> <p>Results</p> <p>We describe the sequence similarity existing among poxvirus A46, N1, N2 and C1 protein families, which share a common domain of approximately 110-140 amino acids at their C-termini that spans the entire N1 sequence. Secondary structure and fold recognition predictions suggest that this domain presents an all-alpha-helical fold compatible with the Bcl-2-like structures of vaccinia virus proteins N1, A52, B15 and K7. We propose that these protein families should be merged into a single one. We describe the phylogenetic distribution of this family and reconstruct its evolutionary history, which indicates an extensive gene gain in ancestral viruses and a further stabilization of its gene content.</p> <p>Conclusions</p> <p>Based on the sequence/structure similarity, we propose that other members with unknown function, like vaccinia virus N2, C1, C6 and C16/B22, might have a similar role in the suppression of host immune response as A46, A52, B15 and K7, by antagonizing at different levels with the TLR signalling pathways.</p

    Direct single-molecule observation of calcium-dependent misfolding in human neuronal calcium sensor-1

    Get PDF
    Neurodegenerative disorders are strongly linked to protein misfolding, and crucial to their explication is a detailed understanding of the underlying structural rearrangements and pathways that govern the formation of misfolded states. Here we use single-molecule optical tweezers to monitor misfolding reactions of the human neuronal calcium sensor-1, a multispecific EF-hand protein involved in neurotransmitter release and linked to severe neurological diseases. We directly observed two misfolding trajectories leading to distinct kinetically trapped misfolded conformations. Both trajectories originate from an on-pathway intermediate state and compete with native folding in a calcium-dependent manner. The relative probability of the different trajectories could be affected by modulating the relaxation rate of applied force, demonstrating an unprecedented real-time control over the free-energy landscape of a protein. Constant-force experiments in combination with hidden Markov analysis revealed the free-energy landscape of the misfolding transitions under both physiological and pathological calcium concentrations. Remarkably for a calcium sensor, we found that higher calcium concentrations increased the lifetimes of the misfolded conformations, slowing productive folding to the native state. We propose a rugged, multidimensional energy landscape for neuronal calcium sensor-1 and speculate on a direct link between protein misfolding and calcium dysregulation that could play a role in neurodegeneration

    MANET: tracing evolution of protein architecture in metabolic networks

    Get PDF
    BACKGROUND: Cellular metabolism can be characterized by networks of enzymatic reactions and transport processes capable of supporting cellular life. Our aim is to find evolutionary patterns and processes embedded in the architecture and function of modern metabolism, using information derived from structural genomics. DESCRIPTION: The Molecular Ancestry Network (MANET) project traces evolution of protein architecture in biomolecular networks. We describe metabolic MANET, a database that links information in the Structural Classification of Proteins (SCOP), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and phylogenetic reconstructions depicting the evolution of protein fold architecture. Metabolic MANET literally 'paints' the ancestries of enzymes derived from rooted phylogenomic trees directly onto over one hundred metabolic subnetworks, enabling the study of evolutionary patterns at global and local levels. An initial analysis of painted subnetworks reveals widespread enzymatic recruitment and an early origin of amino acid metabolism. CONCLUSION: MANET maps evolutionary relationships directly and globally onto biological networks, and can generate and test hypotheses related to evolution of metabolism. We anticipate its use in the study of other networks, such as signaling and other protein-protein interaction networks

    MiST 3.0: an updated microbial signal transduction database with an emphasis on chemosensory systems

    Get PDF
    Bacteria and archaea employ dedicated signal transduction systems that modulate gene expression, second-messenger turnover, quorum sensing, biofilm formation, motility, host-pathogen and beneficial interactions. The updated MiST database provides a comprehensive classification of microbial signal transduction systems. This update is a result of a substantial scaling to accommodate constantly growing microbial genomic data. More than 125 000 genomes, 516 million genes and almost 100 million unique protein sequences are currently stored in the database. For each bacterial and archaeal genome, MiST 3.0 provides a complete signal transduction profile, thus facilitating theoretical and experimental studies on signal transduction and gene regulation. New software infrastructure and distributed pipeline implemented in MiST 3.0 enable regular genome updates based on the NCBI RefSeq database. A novel MiST feature is the integration of unique profile HMMs to link complex chemosensory systems with corresponding chemoreceptors in bacterial and archaeal genomes. The data can be explored online or via RESTful API (freely available at https://mistdb.com)

    MiST 3.0: an updated microbial signal transduction database with an emphasis on chemosensory systems

    Get PDF
    Bacteria and archaea employ dedicated signal transduction systems that modulate gene expression, second-messenger turnover, quorum sensing, biofilm formation, motility, host-pathogen and beneficial interactions. The updated MiST database provides a comprehensive classification of microbial signal transduction systems. This update is a result of a substantial scaling to accommodate constantly growing microbial genomic data. More than 125 000 genomes, 516 million genes and almost 100 million unique protein sequences are currently stored in the database. For each bacterial and archaeal genome, MiST 3.0 provides a complete signal transduction profile, thus facilitating theoretical and experimental studies on signal transduction and gene regulation. Newsoftware infrastructure and distributed pipeline implemented in MiST 3.0 enable regular genome updates based on the NCBI RefSeq database. A novel MiST feature is the integration of unique profile HMMs to link complex chemosensory systems with corresponding chemoreceptors in bacterial and archaeal genomes
    corecore