89 research outputs found

    Comparative genomics of isolates of a pseudomonas aeruginosa epidemic strain associated with chronic lung infections of cystic fibrosis patients

    Get PDF
    Pseudomonas aeruginosa is the main cause of fatal chronic lung infections among individuals suffering from cystic fibrosis (CF). During the past 15 years, particularly aggressive strains transmitted among CF patients have been identified, initially in Europe and more recently in Canada. The aim of this study was to generate high-quality genome sequences for 7 isolates of the Liverpool epidemic strain (LES) from the United Kingdom and Canada representing different virulence characteristics in order to: (1) associate comparative genomics results with virulence factor variability and (2) identify genomic and/or phenotypic divergence between the two geographical locations. We performed phenotypic characterization of pyoverdine, pyocyanin, motility, biofilm formation, and proteolytic activity. We also assessed the degree of virulence using the Dictyostelium discoideum amoeba model. Comparative genomics analysis revealed at least one large deletion (40-50 kb) in 6 out of the 7 isolates compared to the reference genome of LESB58. These deletions correspond to prophages, which are known to increase the competitiveness of LESB58 in chronic lung infection. We also identified 308 non-synonymous polymorphisms, of which 28 were associated with virulence determinants and 52 with regulatory proteins. At the phenotypic level, isolates showed extensive variability in production of pyocyanin, pyoverdine, proteases and biofilm as well as in swimming motility, while being predominantly avirulent in the amoeba model. Isolates from the two continents were phylogenetically and phenotypically undistinguishable. Most regulatory mutations were isolate-specific and 29% of them were predicted to have high functional impact. Therefore, polymorphism in regulatory genes is likely to be an important basis for phenotypic diversity among LES isolates, which in turn might contribute to this strain's adaptability to varying conditions in the CF lung

    Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data

    Get PDF
    The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins

    CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    Get PDF
    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

    An Integrative Multi-Network and Multi-Classifier Approach to Predict Genetic Interactions

    Get PDF
    Genetic interactions occur when a combination of mutations results in a surprising phenotype. These interactions capture functional redundancy, and thus are important for predicting function, dissecting protein complexes into functional pathways, and exploring the mechanistic underpinnings of common human diseases. Synthetic sickness and lethality are the most studied types of genetic interactions in yeast. However, even in yeast, only a small proportion of gene pairs have been tested for genetic interactions due to the large number of possible combinations of gene pairs. To expand the set of known synthetic lethal (SL) interactions, we have devised an integrative, multi-network approach for predicting these interactions that significantly improves upon the existing approaches. First, we defined a large number of features for characterizing the relationships between pairs of genes from various data sources. In particular, these features are independent of the known SL interactions, in contrast to some previous approaches. Using these features, we developed a non-parametric multi-classifier system for predicting SL interactions that enabled the simultaneous use of multiple classification procedures. Several comprehensive experiments demonstrated that the SL-independent features in conjunction with the advanced classification scheme led to an improved performance when compared to the current state of the art method. Using this approach, we derived the first yeast transcription factor genetic interaction network, part of which was well supported by literature. We also used this approach to predict SL interactions between all non-essential gene pairs in yeast (http://sage.fhcrc.org/downloads/downloads/predicted_yeast_genetic_interactions.zip). This integrative approach is expected to be more effective and robust in uncovering new genetic interactions from the tens of millions of unknown gene pairs in yeast and from the hundreds of millions of gene pairs in higher organisms like mouse and human, in which very few genetic interactions have been identified to date

    Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

    Get PDF
    Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology

    Spatial analysis of biomineralization associated gene expression from the mantle organ of the pearl oyster Pinctada maxima

    Get PDF
    Background: Biomineralization is a process encompassing all mineral containing tissues produced within an organism. One of the most dynamic examples of this process is the formation of the mollusk shell, comprising a variety of crystal phases and microstructures. The organic component incorporated within the shell is said to dictate this architecture. However general understanding of how this process is achieved remains ambiguous. The mantle is a conserved organ involved in shell formation throughout molluscs. Specifically the mantle is thought to be responsible for secreting the protein component of the shell. This study employs molecular approaches to determine the spatial expression of genes within the mantle tissue to further the elucidation of the shell biomineralization. Results: A microarray platform was custom generated (PmaxArray 1.0) from the pearl oyster Pinctada maxima. PmaxArray 1.0 consists of 4992 expressed sequence tags (ESTs) originating from mantle tissue. This microarray was used to analyze the spatial expression of ESTs throughout the mantle organ. The mantle was dissected into five discrete regions and analyzed for differential gene expression with PmaxArray 1.0. Over 2000 ESTs were determined to be differentially expressed among the tissue sections, identifying five major expression regions. In situ hybridization validated and further localized the expression for a subset of these ESTs. Comparative sequence similarity analysis of these ESTs revealed a number of the transcripts were novel while others showed significant sequence similarities to previously characterized shell related genes

    ResBoost: characterizing and predicting catalytic residues in enzymes

    Get PDF
    Abstract Background Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. Results We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). Conclusion ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA

    Global Analysis of Proline-Rich Tandem Repeat Proteins Reveals Broad Phylogenetic Diversity in Plant Secretomes

    Get PDF
    Cell walls, constructed by precisely choreographed changes in the plant secretome, play critical roles in plant cell physiology and development. Along with structural polysaccharides, secreted proline-rich Tandem Repeat Proteins (TRPs) are important for cell wall function, yet the evolutionary diversity of these structural TRPs remains virtually unexplored. Using a systems-level computational approach to analyze taxonomically diverse plant sequence data, we identified 31 distinct Pro-rich TRP classes targeted for secretion. This analysis expands upon the known phylogenetic diversity of extensins, the most widely studied class of wall structural proteins, and demonstrates that extensins evolved before plant vascularization. Our results also show that most Pro-rich TRP classes have unexpectedly restricted evolutionary distributions, revealing considerable differences in plant secretome signatures that define unexplored diversity

    clusterMaker: a multi-algorithm clustering plugin for Cytoscape

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present <it>clusterMaker</it>, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. <it>clusterMaker </it>is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL.</p> <p>Results</p> <p>Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast <it>Saccharomyces cerevisiae</it>; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section.</p> <p>Conclusions</p> <p>The Cytoscape plugin <it>clusterMaker </it>provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the <it>clusterMaker </it>plugin. <it>clusterMaker </it>is available via the Cytoscape plugin manager.</p
    corecore