209 research outputs found

    HPIDB - a unified resource for host-pathogen interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions (PPIs) play a crucial role in initiating infection in a host-pathogen system. Identification of these PPIs is important for understanding the underlying biological mechanism of infection and identifying putative drug targets. Database resources for studying host-pathogen systems are scarce and are either host specific or dedicated to specific pathogens.</p> <p>Results</p> <p>Here we describe "HPIDB” a host-pathogen PPI database, which will serve as a unified resource for host-pathogen interactions. Specifically, HPIDB integrates experimental PPIs from several public databases into a single, non-redundant web accessible resource. The database can be searched with a variety of options such as sequence identifiers, symbol, taxonomy, publication, author, or interaction type. The output is provided in a tab delimited text file format that is compatible with Cytoscape, an open source resource for PPI visualization. HPIDB allows the user to search protein sequences using BLASTP to retrieve homologous host/pathogen sequences. For high-throughput analysis, the user can search multiple protein sequences at a time using BLASTP and obtain results in tabular and sequence alignment formats. The taxonomic categorization of proteins (bacterial, viral, fungi, etc.) involved in PPI enables the user to perform category specific BLASTP searches. In addition, a new tool is introduced, which allows searching for homologous host-pathogen interactions in the HPIDB database. </p> <p>Conclusions</p> <p>HPIDB is a unified, comprehensive resource for host-pathogen PPIs. The user interface provides new features and tools helpful for studying host-pathogen interactions. HPIDB can be accessed at <url>http://agbase.msstate.edu/hpi/main.html</url>.</p

    An automated proteomic data analysis workflow for mass spectrometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS<sup>2</sup>) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics.</p> <p>Results</p> <p>The input for our workflow is Bioworks™ 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure ΣXcorr. Alternatively ProtQuant output can be further processed using non-parametric Monte-Carlo resampling statistics to calculate P values for differential expression. Correction for multiple testing of ANOVA and resampling P values is done using Benjamini and Hochberg's method. The results of these statistical analyses are then combined into a single output file containing a comprehensive protein list with probabilities and differential expression analysis, associated P values, and resampling statistics.</p> <p>Conclusion</p> <p>For biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses.</p

    Transcriptomic Analysis of Peritoneal Cells in a Mouse Model of Sepsis: Confirmatory and Novel Results in Early and Late Sepsis.

    Get PDF
    Background The events leading to sepsis start with an invasive infection of a primary organ of the body followed by an overwhelming systemic response. Intra-abdominal infections are the second most common cause of sepsis. Peritoneal fluid is the primary site of infection in these cases. A microarray-based approach was used to study the temporal changes in cells from the peritoneal cavity of septic mice and to identify potential biomarkers and therapeutic targets for this subset of sepsis patients. Results We conducted microarray analysis of the peritoneal cells of mice infected with a non-pathogenic strain of Escherichia coli. Differentially expressed genes were identified at two early (1 h, 2 h) and one late time point (18 h). A multiplexed bead array analysis was used to confirm protein expression for several cytokines which showed differential expression at different time points based on the microarray data. Gene Ontology based hypothesis testing identified a positive bias of differentially expressed genes associated with cellular development and cell death at 2 h and 18 h respectively. Most differentially expressed genes common to all 3 time points had an immune response related function, consistent with the observation that a few bacteria are still present at 18 h. Conclusions Transcriptional regulators like PLAGL2, EBF1, TCF7, KLF10 and SBNO2, previously not described in sepsis, are differentially expressed at early and late time points. Expression pattern for key biomarkers in this study is similar to that reported in human sepsis, indicating the suitability of this model for future studies of sepsis, and the observed differences in gene expression suggest species differences or differences in the response of blood leukocytes and peritoneal leukocytes

    Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of non-coding transcripts in human, mouse, and <it>Escherichia coli </it>has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in <it>Streptococcus pneumoniae </it>(pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution.</p> <p>Results</p> <p>Here, we describe a high-resolution transcription map of the <it>S. pneumoniae </it>clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to <it>S. </it><it>pneumoniae </it>genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome.</p> <p>Conclusions</p> <p>In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing <it>S. pneumoniae </it>physiology and virulence.</p

    Deep Sensitivity Analysis for Objective-Oriented Combinatorial Optimization

    Full text link
    Pathogen control is a critical aspect of modern poultry farming, providing important benefits for both public health and productivity. Effective poultry management measures to reduce pathogen levels in poultry flocks promote food safety by lowering risks of food-borne illnesses. They also support animal health and welfare by preventing infectious diseases that can rapidly spread and impact flock growth, egg production, and overall health. This study frames the search for optimal management practices that minimize the presence of multiple pathogens as a combinatorial optimization problem. Specifically, we model the various possible combinations of management settings as a solution space that can be efficiently explored to identify configurations that optimally reduce pathogen levels. This design incorporates a neural network feedback-based method that combines feature explanations with global sensitivity analysis to ensure combinatorial optimization in multiobjective settings. Our preliminary experiments have promising results when applied to two real-world agricultural datasets. While further validation is still needed, these early experimental findings demonstrate the potential of the model to derive targeted feature interactions that adaptively optimize pathogen control under varying real-world constraints.Comment: The 2023 International Conference on Computational Science & Computational Intelligence (CSCI'23

    EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications

    Full text link
    Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.Comment: 2024 7th International Conference on Information and Computer Technologies (ICICT

    Comprehensive proteomic analysis of bovine spermatozoa of varying fertility rates and identification of biomarkers associated with fertility

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Male infertility is a major problem for mammalian reproduction. However, molecular details including the underlying mechanisms of male fertility are still not known. A thorough understanding of these mechanisms is essential for obtaining consistently high reproductive efficiency and to ensure lower cost and time-loss by breeder.</p> <p>Results</p> <p>Using high and low fertility bull spermatozoa, here we employed differential detergent fractionation multidimensional protein identification technology (DDF-Mud PIT) and identified 125 putative biomarkers of fertility. We next used quantitative Systems Biology modeling and canonical protein interaction pathways and networks to show that high fertility spermatozoa differ from low fertility spermatozoa in four main ways. Compared to sperm from low fertility bulls, sperm from high fertility bulls have higher expression of proteins involved in: energy metabolism, cell communication, spermatogenesis, and cell motility. Our data also suggests a hypothesis that low fertility sperm DNA integrity may be compromised because cell cycle: G<sub>2</sub>/M DNA damage checkpoint regulation was most significant signaling pathway identified in low fertility spermatozoa.</p> <p>Conclusion</p> <p>This is the first comprehensive description of the bovine spermatozoa proteome. Comparative proteomic analysis of high fertility and low fertility bulls, in the context of protein interaction networks identified putative molecular markers associated with high fertility phenotype.</p
    corecore