30 research outputs found
Genomic epidemiology of multidrug‐resistant Gram‐negative organisms
The emergence and spread of antibiotic‐resistant Gram‐negative bacteria (rGNB) across global healthcare networks presents a significant threat to public health. As the number of effective antibiotics available to treat these resistant organisms dwindles, it is essential that we devise more effective strategies for controlling their proliferation. Recently, whole‐genome sequencing has emerged as a disruptive technology that has transformed our understanding of the evolution and epidemiology of diverse rGNB species, and it has the potential to guide strategies for controlling the evolution and spread of resistance. Here, we review specific areas in which genomics has already made a significant impact, including outbreak investigations, regional epidemiology, clinical diagnostics, resistance evolution, and the study of epidemic lineages. While highlighting early successes, we also point to the next steps needed to translate this technology into strategies to improve public health and clinical medicine.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147016/1/nyas13672.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147016/2/nyas13672_am.pd
Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network
An evidence-weighted functional-linkage network of human genes reveals associations among diseases that share no known disease genes and have dissimilar phenotype
Comparative assessment of performance and genome dependence among phylogenetic profiling methods
BACKGROUND: The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes. RESULTS: In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results. CONCLUSION: Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes
Towards the identification of essential genes using targeted genome sequencing and comparative analysis
BACKGROUND: The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for experimental studies aimed at constructing a catalog of essential genes in a given organism, a computational approach which could identify essential genes with high accuracy would be of great value. RESULTS: We gathered numerous features which could be generated automatically from genome sequence data and assessed their relationship to essentiality, and subsequently utilized machine learning to construct an integrated classifier of essential genes in both S. cerevisiae and E. coli. When looking at single features, phyletic retention, a measure of the number of organisms an ortholog is present in, was the most predictive of essentiality. Furthermore, during construction of our phyletic retention feature we for the first time explored the evolutionary relationship among the set of organisms in which the presence of a gene is most predictive of essentiality. We found that in both E. coli and S. cerevisiae the optimal sets always contain host-associated organisms with small genomes which are closely related to the reference. Using five optimally selected organisms, we were able to improve predictive accuracy as compared to using all available sequenced organisms. We hypothesize the predictive power of these genomes is a consequence of the process of reductive evolution, by which many parasites and symbionts evolved their gene content. In addition, essentiality is measured in rich media, a condition which resembles the environments of these organisms in their hosts where many nutrients are provided. Finally, we demonstrate that integration of our most highly predictive features using a probabilistic classifier resulted in accuracies surpassing any individual feature. CONCLUSION: Using features obtainable directly from sequence data, we were able to construct a classifier which can predict essential genes with high accuracy. Furthermore, our analysis of the set of genomes in which the presence of a gene is most predictive of essentiality may suggest ways in which targeted sequencing can be used in the identification of essential genes. In summary, the methods presented here can aid in the reduction of time and money invested in essential gene identification by targeting those genes for experimentation which are predicted as being essential with a high probability
High-precision high-coverage functional inference from integrated data sources
<p>Abstract</p> <p>Background</p> <p>Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.</p> <p>Results</p> <p>We first apply this framework to <it>Saccharomyces cerevisiae</it>. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.</p> <p>Conclusion</p> <p>We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.</p
Model-driven analysis of experimentally determined growth phenotypes for 465 yeast gene deletion mutants under 16 different conditions
An iterative approach that integrates high-throughput measurements of yeast deletion mutants and flux balance model predictions improves understanding of both experimental and computational results
Epistatic interaction maps relative to multiple metabolic phenotypes
An epistatic interaction between two genes occurs when the phenotypic impact of one gene depends on another gene, often exposing a functional association between them. Due to experimental scalability and to evolutionary significance, abundant work has been focused on studying how epistasis affects cellular growth rate, most notably in yeast. However, epistasis likely influences many different phenotypes, affecting our capacity to understand cellular functions, biochemical networks adaptation, and genetic diseases. Despite its broad significance, the extent and nature of epistasis relative to different phenotypes remain fundamentally unexplored. Here we use genome-scale metabolic network modeling to investigate the extent and properties of epistatic interactions relative to multiple phenotypes. Specifically, using an experimentally refined stoichiometric model for Saccharomyces cerevisiae, we computed a three-dimensional matrix of epistatic interactions between any two enzyme gene deletions, with respect to all metabolic flux phenotypes. We found that the total number of epistatic interactions between enzymes increases rapidly as phenotypes are added, plateauing at approximately 80 phenotypes, to an overall connectivity that is roughly 8-fold larger than the one observed relative to growth alone. Looking at interactions across all phenotypes, we found that gene pairs interact incoherently relative to different phenotypes, i.e. antagonistically relative to some phenotypes and synergistically relative to others. Specific deletion-deletion-phenotype triplets can be explained metabolically, suggesting a highly informative role of multi-phenotype epistasis in mapping cellular functions. Finally, we found that genes involved in many interactions across multiple phenotypes are more highly expressed, evolve slower, and tend to be associated with diseases, indicating that the importance of genes is hidden in their total phenotypic impact. Our predictions indicate a pervasiveness of nonlinear effects in how genetic perturbations affect multiple metabolic phenotypes. The approaches and results reported could influence future efforts in understanding metabolic diseases and the role of biochemical regulation in the cell.This work was supported by grants from the National Institute of Health (1RC2GM092602-01, 1R01GM089978, and 1R01GM078209). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. (1RC2GM092602-01 - National Institute of Health; 1R01GM089978 - National Institute of Health; 1R01GM078209 - National Institute of Health)Published versio
Optimality criteria for the prediction of metabolic fluxes in yeast mutants
Constraint-based models of cellular metabolism, such as flux balance analysis (FBA), use convex analysis and optimization to study metabolic networks at a genome scale. The availability of reaction lists for numerous organisms, along with a variety of network analysis and optimization tools, is making these approaches increasingly popular for metabolic engineering and biomedical applications, as well as for addressing fundamental biological questions. It is therefore very important to assess the predictive capacity of these models and to understand how to interpret them in a biologically relevant manner. Typically, model assessment is limited to gauging the ability to predict phenotypes, such as viability under different environmental and genetic conditions. These types of assessments, for the most part, focus only on the growth phenotype of the cells, but ignore the underlying flux predictions. While this may be sufficient for certain types of study, the question of whether flux balance models can reliably predict intracellular and transport fluxes is crucial for more detailed analysis, and remains largely unanswered. Here we compare FBA model predictions of yeast metabolic fluxes to a previously published set of experimentally determined fluxes for 13 different single gene deletion mutants across a variety of possible objective functions. We find that the specific optimization criteria used to determine fluxes have a significant impact on the accuracy of the predicted fluxes. Interestingly, while different optimization methods provide very different levels of agreement relative to experimental fluxes, they tend to provide similar predictions with respect to the effect of the perturbation on growth. This demonstrates that assessment of models at the level of flux predictions is a critical step in assessing the biological validity of different models and optimization criteria
cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets
Abstract
Background
The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis.
Results
We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h.
Conclusions
We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package (
https://github.com/rdcrawford/cognac
) with customizable parameters for adaptation to diverse applications.http://deepblue.lib.umich.edu/bitstream/2027.42/173436/1/12859_2021_Article_3981.pd