10 research outputs found

    Increasing workflow development speed and reproducibility with Vectools [version 2; referees: 2 approved]

    Get PDF
    Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions

    Predicting genome-scale Arabidopsis-Pseudomonas syringae interactome using domain and interolog-based approaches

    Get PDF
    Background: Every year pathogenic organisms cause billions of dollars' worth damage to crops and livestock. In agriculture, study of plant-microbe interactions is demanding a special attention to develop management strategies for the destructive pathogen induced diseases that cause huge crop losses every year worldwide. Pseudomonas syringae is a major bacterial leaf pathogen that causes diseases in a wide range of plant species. Among its various strains, pathovar tomato strain DC3000 (PstDC3000) is asserted to infect the plant host Arabidopsis thaliana and thus, has been accepted as a model system for experimental characterization of the molecular dynamics of plant-pathogen interactions. Protein-protein interactions (PPIs) play a critical role in initiating pathogenesis and maintaining infection. Understanding the PPI network between a host and pathogen is a critical step for studying the molecular basis of pathogenesis. The experimental study of PPIs at a large scale is very scarce and also the high throughput experimental results show high false positive rate. Hence, there is a need for developing efficient computational models to predict the interaction between host and pathogen in a genome scale, and find novel candidate effectors and/or their targets.Results: In this study, we used two computational approaches, the interolog and the domain-based to predict the interactions between Arabidopsis and PstDC3000 in genome scale. The interolog method relies on protein sequence similarity to conduct the PPI prediction. A Pseudomonas protein and an Arabidopsis protein are predicted to interact with each other if an experimentally verified interaction exists between their respective homologous proteins in another organism. The domain-based method uses domain interaction information, which is derived from known protein 3D structures, to infer the potential PPIs. If a Pseudomonas and an Arabidopsis protein contain an interacting domain pair, one can expect the two proteins to interact with each other. The interolog-based method predicts ~0.79M PPIs involving around 7700 Arabidopsis and 1068 Pseudomonas proteins in the full genome. The domain-based method predicts 85650 PPIs comprising 11432 Arabidopsis and 887 Pseudomonas proteins. Further, around 11000 PPIs have been identified as interacting from both the methods as a consensus.Conclusion: The present work predicts the protein-protein interaction network between Arabidopsis thaliana and Pseudomonas syringae pv. tomato DC3000 in a genome wide scale with a high confidence. Although the predicted PPIs may contain some false positives, the computational methods provide reasonable amount of interactions which can be further validated by high throughput experiments. This can be a useful resource to the plant community to characterize the host-pathogen interaction in Arabidopsis and Pseudomonas system. Further, these prediction models can be applied to the agriculturally relevant crops.Peer reviewedNational Institute for Microbial Forensics and Food and Agricultural BiosecurityBiochemistry and Molecular Biolog

    LacSubPred: Predicting subtypes of Laccases, an important lignin metabolism-related enzyme class, using in silico approaches

    Get PDF
    Background: Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Their usefulness stems from the ability to act on a diverse range of phenolic compounds such as o-/p-quinols, aminophenols, polyphenols, polyamines, aryl diamines, and aromatic thiols. Despite acting on a wide range of compounds as a family, individual Laccases often exhibit distinctive and varied substrate ranges. This is likely due to Laccases involvement in many metabolic roles across diverse taxa. Classification systems for multi-copper oxidases have been developed using multiple sequence alignments, however, these systems seem to largely follow species taxonomy rather than substrate ranges, enzyme properties, or specific function. It has been suggested that the roles and substrates of various Laccases are related to their optimal pH. This is consistent with the observation that fungal Laccases usually prefer acidic conditions, whereas plant and bacterial Laccases prefer basic conditions. Based on these observations, we hypothesize that a descriptor-based unsupervised learning system could generate homology independent classification system for better describing the functional properties of Laccases.Results: In this study, we first utilized unsupervised learning approach to develop a novel homology independent Laccase classification system. From the descriptors considered, physicochemical properties showed the best performance. Physicochemical properties divided the Laccases into twelve subtypes. Analysis of the clusters using a t-test revealed that the majority of the physicochemical descriptors had statistically significant differences between the classes. Feature selection identified the most important features as negatively charges residues, the peptide isoelectric point, and acidic or amidic residues. Secondly, to allow for classification of new Laccases, a supervised learning system was developed from the clusters. The models showed high performance with an overall accuracy of 99.03%, error of 0.49%, MCC of 0.9367, precision of 94.20%, sensitivity of 94.20%, and specificity of 99.47% in a 5-fold cross-validation test. In an independent test, our models still provide a high accuracy of 97.98%, error rate of 1.02%, MCC of 0.8678, precision of 87.88%, sensitivity of 87.88% and specificity of 98.90%.Conclusion: This study provides a useful classification system for better understanding of Laccases from their physicochemical properties perspective. We also developed a publically available web tool for the characterization of Laccase protein sequences (http://lacsubpred.bioinfo.ucr.edu/). Finally, the programs used in the study are made available for researchers interested in applying the system to other enzyme classes (https://github.com/tweirick/SubClPred).Peer reviewedNational Institute for Microbial Forensics and Food and Agricultural BiosecurityBiochemistry and Molecular Biolog

    Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

    Get PDF
    Background: Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning.Results: In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms.Conclusion: The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.Peer reviewedNational Institute for Microbial Forensics and Food and Agricultural BiosecurityBiochemistry and Molecular Biolog

    The Genome of a Thermo Tolerant, Pathogenic Albino Aspergillus fumigatus

    Get PDF
    Biotechnologists are interested in thermo tolerant fungi to manufacture enzymes active and stable at high temperatures, because they provide improved catalytic efficiency, strengthen enzyme substrate interactions, accelerate substrate enzyme conversion rates, enhance mass transfer, lower substrate viscosity, lessen contamination risk and offer the potential for enzyme recycling. Members of the genus Aspergillus live a wide variety of lifestyles, some embrace GRAS status routinely employed in food processing while others such as Aspergillus fumigatus are human pathogens. A. fumigatus produces melanins, pyomelanin protects the fungus against reactive oxygen species and DHN melanin produced by the pksP gene cluster confers the gray-greenish color. pksP mutants are attenuated in virulence. Here we report on the genomic DNA sequence of a thermo tolerant albino Aspergillus isolated from rain forest composted floors. Unexpectedly, the nucleotide sequence was 95.7% identical to the reported by Aspergillus fumigatus Af293. Genome size and predicted gene models were also highly similar, however differences in DNA content and conservation were observed. The albino strain, classified as Aspergillus fumigatus var. niveus, had 160 gene models not present in A. fumigatus Af293 and A. fumigatus Af293 had 647 not found in the albino strain. Furthermore, the major pigment generating gene cluster pksP appeared to have undergone genomic rearrangements and a key tyrosinase present in many aspergilli was missing from the genome. Remarkably however, despite the lack of pigmentation A. fumigatus var. niveus killed neutropenic mice and survived macrophage engulfment at similar rates as A. fumigatus Af293

    Long Non-coding RNAs in Endothelial Biology

    No full text
    In recent years, the role of RNA has expanded to the extent that protein-coding RNAs are now the minority with a variety of non-coding RNAs (ncRNAs) now comprising the majority of RNAs in higher organisms. A major contributor to this shift in understanding is RNA sequencing (RNA-seq), which allows a largely unconstrained method for monitoring the status of RNA from whole organisms down to a single cell. This observational power presents both challenges and new opportunities, which require specialized bioinformatics tools to extract knowledge from the data and the ability to reuse data for multiple studies. In this review, we summarize the current status of long non-coding RNA (lncRNA) research in endothelial biology. Then, we will cover computational methods for identifying, annotating, and characterizing lncRNAs in the heart, especially endothelial cells

    Increasing workflow development speed and reproducibility with Vectools [version 2; referees: 2 approved]

    Get PDF
    Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions

    Investigation of RNA Editing Sites within Bound Regions of RNA-Binding Proteins

    No full text
    Studies in epitranscriptomics indicate that RNA is modified by a variety of enzymes. Among these RNA modifications, adenosine to inosine (A-to-I) RNA editing occurs frequently in the mammalian transcriptome. These RNA editing sites can be detected directly from RNA sequencing (RNA-seq) data by examining nucleotide changes from adenosine (A) to guanine (G), which substitutes for inosine (I). However, a careful investigation of such nucleotide changes must be conducted to distinguish sequencing errors and genomic mutations from the genuine editing sites. Building upon our recent introduction of an easy-to-use bioinformatics tool, RNA Editor, to detect RNA editing events from RNA-seq data, we examined the extent by which RNA editing events affect the binding of RNA-binding proteins (RBP). Through employing bioinformatic techniques, we uncovered that RNA editing sites occur frequently in RBP-bound regions. Moreover, the presence of RNA editing sites are more frequent when RNA editing islands were examined, which are regions in which RNA editing sites are present in clusters. When the binding of one RBP, human antigen R [HuR; encoded by ELAV-like protein 1 (ELAV1)], was quantified experimentally, its binding was reduced upon silencing of the RNA editing enzyme adenosine deaminases acting on RNA (ADAR) compared to the control—suggesting that the presence of RNA editing islands influence HuR binding to its target regions. These data indicate RNA editing as an important mediator of RBP–RNA interactions—a mechanism which likely constitutes an additional mode of post-transcription gene regulation in biological systems
    corecore