62 research outputs found
N-TERMINAL PROCESSING OF RIBOSOMAL PROTEIN L27 IN STAPHYLOCOCCUS AUREUS
The bacterial ribosome is essential to cell growth yet little is known about how its proteins attain their mature structures. Recent studies indicate that certain Staphlyococcus aureus bacteriophage protein sequences contain specific sites that may be cleaved by a non-bacteriophage enzyme (Poliakov et al. 2008). The phage cleavage site was found to bear sequence similarity to the N-terminus of S. aureus ribosomal protein L27. Previous studies in E. coli (Wower et al.1998; Maguire et al. 2005) found that L27 is situated adjacent to the ribosomal peptidyl transferase site, where it likely aids in new peptide formation. The predicted S. aureus L27 protein contains an additional N-terminal sequence not observed within the N-terminus of the otherwise similar E. coli L27; this sequence appears to be cleaved, indicating yet-unobserved ribosomal protein post-translational processing and use of host processes by phage. Phylogenetic analysis shows that L27 processing has the potential to be highly conserved. Further study of this phenomenon may aid antibiotic development
Interactomics-Based Functional Analysis: Using Interaction Conservation To Probe Bacterial Protein Functions
The emergence of genomics as a discrete field of biology has changed humanity’s understanding of our relationship with bacteria. Sequencing the genome of each newly-discovered bacterial species can reveal novel gene sequences, though the genome may contain genes coding for hundreds or thousands of proteins of unknown function (PUFs). In some cases, these coding sequences appear to be conserved across nearly all bacteria. Exploring the functional roles of these cases ideally requires an integrative, cross-species approach involving not only gene sequences but knowledge of interactions among their products. Protein interactions, studied at genome scale, extend genomics into the field of interactomics. I have employed novel computational methods to provide context for bacterial PUFs and to leverage the rich genomic, proteomic, and interactomic data available for hundreds of bacterial species.
The methods employed in this study began with sets of protein complexes. I initially hypothesized that, if protein interactions reveal protein functions and interactions are frequently conserved through protein complexes, then conserved protein functions should be revealed through the extent of conservation of protein complexes and their components. The subsequent analyses revealed how partial protein complex conservation may, unexpectedly, be the rule rather than the exception. Next, I expanded the analysis by combining sets of thousands of experimental protein-protein interactions. Progressing beyond the scope of protein complexes into interactions across full proteomes revealed novel evolutionary consistencies across bacteria but also exposed deficiencies among interactomics-based approaches. I have concluded this study with an expansion beyond bacterial protein interactions and into those involving bacteriophage-encoded proteins.
This work concerns emergent evolutionary properties among bacterial proteins. It is primarily intended to serve as a resource for microbiologists but is relevant to any research into evolutionary biology. As microbiomes and their occupants become increasingly critical to human health, similar approaches may become increasingly necessary
Protein Complexes in Bacteria
Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometrycharacterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies
Bacterial protein meta-interactomes predict cross-species interactions and protein function
Background Protein-protein interactions (PPIs) can offer compelling evidence for protein function, especially when viewed in the context of proteome-wide interactomes. Bacteria have been popular subjects of interactome studies: more than six different bacterial species have been the subjects of comprehensive interactome studies while several more have had substantial segments of their proteomes screened for interactions. The protein interactomes of several bacterial species have been completed, including several from prominent human pathogens. The availability of interactome data has brought challenges, as these large data sets are difficult to compare across species, limiting their usefulness for broad studies of microbial genetics and evolution. Results In this study, we use more than 52,000 unique protein-protein interactions (PPIs) across 349 different bacterial species and strains to determine their conservation across data sets and taxonomic groups. When proteins are collapsed into orthologous groups (OGs) the resulting meta-interactome still includes more than 43,000 interactions, about 14,000 of which involve proteins of unknown function. While conserved interactions provide support for protein function in their respective species data, we found only 429 PPIs (~1% of the available data) conserved in two or more species, rendering any cross-species interactome comparison immediately useful. The meta-interactome serves as a model for predicting interactions, protein functions, and even full interactome sizes for species with limited to no experimentally observed PPI, including Bacillus subtilis and Salmonella enterica which are predicted to have up to 18,000 and 31,000 PPIs, respectively. Conclusions In the course of this work, we have assembled cross-species interactome comparisons that will allow interactomics researchers to anticipate the structures of yet-unexplored microbial interactomes and to focus on well-conserved yet uncharacterized interactors for further study. Such conserved interactions should provide evidence for important but yet-uncharacterized aspects of bacterial physiology and may provide targets for anti-microbial therapies
Clinical Temporal Relation Extraction with Probabilistic Soft Logic Regularization and Global Inference
There has been a steady need in the medical community to precisely extract
the temporal relations between clinical events. In particular, temporal
information can facilitate a variety of downstream applications such as case
report retrieval and medical question answering. Existing methods either
require expensive feature engineering or are incapable of modeling the global
relational dependencies among the events. In this paper, we propose a novel
method, Clinical Temporal ReLation Exaction with Probabilistic Soft Logic
Regularization and Global Inference (CTRL-PG) to tackle the problem at the
document level. Extensive experiments on two benchmark datasets, I2B2-2012 and
TB-Dense, demonstrate that CTRL-PG significantly outperforms baseline methods
for temporal relation extraction.Comment: 10 pages, 4 figures, 7 tables, accepted by AAAI 202
KG-Hub-building and exchanging biological knowledge graphs.
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.
RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification.
AVAILABILITY AND IMPLEMENTATION: https://kghub.org
The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch\u27s APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch\u27s data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch\u27s analytic tools by developing a customized plugin for OpenAI\u27s ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app
The Human Phenotype Ontology in 2024: phenotypes around the world.
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs
- …