494 research outputs found

    In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many approaches have been used to study the evolution, population structure and genetic diversity of <it>Escherichia coli </it>O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of <it>E. coli </it>O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an <it>in silico </it>comparison of six different genotyping approaches was performed on 19 <it>E. coli </it>genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the <it>E. coli </it>O157:H7 population, and to compare genotyping methods for O157:H7 strains.</p> <p>Results</p> <p><it>In silico </it>determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to <it>E. coli </it>K12 and <it>E. coli </it>O55:H7, O145:NM and sorbitol-fermenting O157 strains.</p> <p>Conclusion</p> <p>The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and accessed locally in an easily transferable, informative and extensible format based on comparative genomic analyses.</p

    The Era of Commercialized Genetics: Examining the Intersection of DNA, Identity, and Personal Origin

    Get PDF

    Tempering the Adversary: An Exploration into the Applications of Game Theoretic Feature Selection and Regression

    Get PDF
    Most modern machine learning algorithms tend to focus on an average-case approach, where every data point contributes the same amount of influence towards calculating the fit of a model. This per-data point error (or loss) is averaged together into an overall loss and typically minimized with an objective function. However, this can be insensitive to valuable outliers. Inspired by game theory, the goal of this work is to explore the utility of incorporating an optimally-playing adversary into feature selection and regression frameworks. The adversary assigns weights to the data elements so as to degrade the modeler\u27s performance in an optimal manner, thereby forcing the modeler to construct a more robust solution. A tuning parameter enables tempering of the power wielded by the adversary, allowing us to explore the spectrum between average case and worst case. By formulating our method as a linear program, it can be solved efficiently, and can accommodate sub-population constraints, a feature that other related methods cannot easily implement. We feel that the need to generate models while understanding the influence of sub-population constraints should be particularly prominent in biomedical literature, and though our method was developed in response to the ubiquity of sub-population data and outliers that exist in this realm, our method is generic and can be applied to data sets that are not exclusively biomedical in nature. We additionally explore the implementation of our method as an adversarial regression problem. Here, instead of providing the user with a fitting of parameters for the model, we provide the user with an ensemble of parameters which can be tuned based on sensitivity to outliers and various sub-population constraints. Finally, to help foster a better understanding of various data sets, we will discuss potential automated applications of our method which will enable data scientists to explore underlying relationships and sensitivities that may be a consequence of sub-populations and meaningful outliers

    Quantitative Phenotype Analysis to Identify, Validate and Compare Rat Disease Models

    Get PDF
    Introduction The laboratory rat has been widely used as an animal model in biomedical research. There are many strains exhibiting a wide variety of phenotypes. Capturing these phenotypes in a centralized database provides researchers with an easy method for choosing the appropriate strains for their studies. Current resources such as NBRP and PhysGen provided some preliminary work in rat phenotype databases. However, there are drawbacks in both projects: (1) small number of animals (6 rats) used by NBRP; (2) NBRP project is a one-time effort for each strain; (3) PhysGen web interface only enables queries within a single study – data comparison and integration not possible; (4) PhysGen lacks a data standardization process so that the measurement method, experimental condition, and age of rats used are hidden. Therefore, there is a need for a better data integration and visualization method in order to provide users with more insights about phenotype differences across rat strains. The Rat Genome Database (RGD) PhenoMiner tool has provided the first step in this effort by standardizing and integrating data from individual studies as well as NBRP and PhysGen. Methods Our work involved the following key steps: (1) we developed a meta-analysis pipeline to automatically integrate data from heterogeneous sources and to produce expected ranges (standardized phenotype ranges) for different strains, and different phenotypes under different experimental conditions; (2) we created tools to visualize expected ranges for individual strains and strain groups; (3) we clustered substrains into different sub-populations according to phenotype correlations. Results We developed a meta-analysis pipeline and an interactive web interface that summarizes and visualizes expected ranges produced from the meta-analysis pipeline. Automation of the pipeline allows for updates as additional data becomes available. The interactive web interface provides the researchers with a platform for identifying and validating expected ranges for a variety of quantitative phenotypes. In addition, we performed a preliminary cluster analysis that enables researchers to examine similarities of strains, substrains, and different sex or age groups of strains on a multi-dimensional scale by using multiple phenotype features. Conclusion The data resources and the data mining and visualization tools will promote an understanding of rat disease models, guide researchers to choose optimal strains for their research needs, and encourage data sharing from different research hubs. Such resources also help to promote research reproducibility. Data produced and interactive platforms created in this project will continue to provide a valuable resource for Translational Research efforts

    Molecular Systematics of Selected Members of the Black Basses, Genus Micropterus, With Concentration on the Spotted Bass (M. Punctulatus) Species Complex.

    Get PDF
    This study examined genetic relationships among selected populations of black basses. These centrarchid fishes, separated by both physical barriers (land formations) and distance, have shown varying degrees of differentiation, but retain many morphometric characters in common. Eight populations representing four taxa and geographical extremes in the genus Micropterus, with concentration on the spotted bass complex, were selected and evaluated for biochemical genetic characters. This study examined two species and two subspecies of spotted basses. The type species from Kentucky represented Micropterus punctulatus punctulatus; a population from Alabama represented M. p. henshalli. A Texas population, previously classified as conspecific with spotted bass but now listed as a distinct species, was included. One primary objective of this study was where the Louisiana populations of M. punctulatus align within this group, as these populations are found at a central geographic position in the distribution of these differentiated basses. Since previous studies have revealed low levels of genetic variability, a technique more sensitive to genetic differences was used, and compared to results from allozyme analysis, the more traditional method for assessing genetic differentiation. Both allozyme analysis and random amplified polymorphic DNA-polymerase chain reaction (RAPD-PCR) were used to assess genetic relationships. These two techniques resolved very different relationships. The allozyme study showed the type species, Kentucky bass, as most divergent, but supported the predicted relationships among the remaining four populations. The RAPD-PCR results were in basic agreement with the expected taxonomy. Based on similarities at 302 polymorphic RAPD loci, the two Louisiana and Kentucky populations closely clustered, with the subspecies M. p. henshalli the next most divergent, and M. treculi, diverging next, but completing a cohesive cluster with the other spotted bass relative to the outgroups. A yet unnamed new form from Florida, the Chipola bass, was also analyzed with this technique. PCR results place this form approximately equal distances from the other two outgroup species and the punctulatus group. Therefore, this analysis would support species recognition for the Chipola bass, and regrouping the Texas strain of spotted bass in the M. punctulatus species complex

    General Principles for the Validation of Proarrhythmia Risk Prediction Models: An Extension of the CiPA In Silico Strategy

    Get PDF
    This white paper presents principles for validating proarrhythmia risk prediction models for regulatory use as discussed at the In Silico Breakout Session of a Cardiac Safety Research Consortium/Health and Environmental Sciences Institute/US Food and Drug Administration–sponsored Think Tank Meeting on May 22, 2018. The meeting was convened to evaluate the progress in the development of a new cardiac safety paradigm, the Comprehensive in Vitro Proarrhythmia Assay (CiPA). The opinions regarding these principles reflect the collective views of those who participated in the discussion of this topic both at and after the breakout session. Although primarily discussed in the context of in silico models, these principles describe the interface between experimental input and model‐based interpretation and are intended to be general enough to be applied to other types of nonclinical models for proarrhythmia assessment. This document was developed with the intention of providing a foundation for more consistency and harmonization in developing and validating different models for proarrhythmia risk prediction using the example of the CiPA paradigm

    Clinical, genetic and molecular aspects of membranous nephropathy

    Get PDF
    Membranous Nephropathy (MN) is one of the leading causes of end-stage renal disease (ESRD). MN is an autoimmune disease in which autoantibodies target antigens at the level of the glomerular basement membrane. The nature of these antibodies and the reason why they develop are not fully understood. One of the strategies towards a better understanding of the disorder is genetic analysis, of which two approaches have been attempted: linkage mapping, based on a family suggestive for X-linked transmission of the MN trait; and whole genome association mapping, based on three case-control cohorts. The first cohort (335 cases and ethnically matched controls from the UK) was genotyped using SNP markers and analysed in an exploratory study which led to the identification of two highly significant loci of association. Two cohorts (146 biopsy proven MN cases and ethnically matched controls from the Dutch research group in Nijmegen and 75 biopsy proven cases and ethnically matched controls from the French research group in Paris) were used to successfully replicate the results. The two loci which we identified and independently confirmed are located on chromosome 2 and on chromosome 6. The chromosome 2 locus includes the PLA2R gene, confirming the hypothesis of Beck et al. which identified PLA2R as a key antigen in idiopathic MN by using an immunological approach [1]. The chromosome 6 locus lies within the extended Human Leukocyte Antigene (HLA) system locus, with the highest significance for association reached by alleles of HLA-DQA1. Our results suggest that the susceptibility to membranous nephropathy is associated to genetic variants at the level of both PLA2R1 and HLA loci. The causative variants could be some of the polymorphisms captured by the genotyping array which was analysed or, more likely variants (single nucleotide or copy number variant type) situated nearby (and therefore in linkage disequilibrium)

    Discovery of Type 2 Diabetes Trajectories from Electronic Health Records

    Get PDF
    University of Minnesota Ph.D. dissertation. September 2020. Major: Health Informatics. Advisor: Gyorgy Simon. 1 computer file (PDF); xiii, 110 pages.Type 2 diabetes (T2D) is one of the fastest growing public health concerns in the United States. There were 30.3 million patients (9.4% of the US populations) suffering from diabetes in 2015. Diabetes, which is the seventh leading cause of death in the United States, is known to be a non-reversible (incurable) chronic disease, leading to severe complications, including chronic kidney disease, amputation, blindness, and various cardiac and vascular diseases. Early identification of patients at high risk is regarded as the most effective clinical tool to prevent or delay the development of diabetes, allowing patients to change their life style or to receive medication earlier. In turn, these interventions can help decrease the risk of diabetes by 30-60%. Many studies have been conducted aiming at the early identification of patients at high risk in the clinical settings. These studies typically only consider the patient's current state at the time of the assessment and do not fully utilize all available information such as patient's medical history. Past history is important. It has been shown that laboratory results and vital signs can differ between diabetic and non-diabetic patients as many as 15-20 years before the onset of diabetes. We have also shown in our study that the order in which patients develop diabetes-related comorbidities is predictive of their diabetes risk even after adjusting for the severity of the comorbidities. In this thesis, we develop multiple novel methods to discover T2D trajectories from Electronic Health Records (EHR). We define trajectory as an order of in which diseases developed. We aim to discover typical and atypical trajectories where typical trajectories represent predominant patterns of progressions and atypical trajectories refer to the rest of the trajectories. Revealing trajectories can allow us to divide patients into subpopulations that can uncover the underlying etiology of diabetes. More importantly, by assessing the risk correctly and by a better understanding of the heterogeneity of diabetes, we can provide better care. Since data collected from EHR poses several challenges to directly identify trajectories from EHR data, we devise four specific studies to address the challenges: First, we propose a new knowledge-driven representation for clinical data mining, second, we demonstrate a method for estimating the onset time of slow-onset diseases from intermittently observable laboratory results in the specific context of T2D, third, we present a method to infer trajectories, the sequence of comorbidities potentially leading up to a particular disease of interest, and finally, we propose a novel method to discover multiple trajectories from EHR data. The patterns we discovered from above four studies address a clinical issue, are clinically verifiable and are amenable to deployment in practice to improve the quality of individual patient care towards promoting public health in the United States
    corecore