545 research outputs found

    Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis

    Get PDF
    BACKGROUND: Mutations in rpoB, the gene encoding the β subunit of DNA-dependent RNA polymerase, are associated with rifampin resistance in Mycobacterium tuberculosis. Several studies have been conducted where minimum inhibitory concentration (MIC, which is defined as the minimum concentration of the antibiotic in a given culture medium below which bacterial growth is not inhibited) of rifampin has been measured and partial DNA sequences have been determined for rpoB in different isolates of M. tuberculosis. However, no model has been constructed to predict rifampin resistance based on sequence information alone. Such a model might provide the basis for quantifying rifampin resistance status based exclusively on DNA sequence data and thus eliminate the requirements for time consuming culturing and antibiotic testing of clinical isolates. RESULTS: Sequence data for amino acid positions 511–533 of rpoB and associated MIC of rifampin for different isolates of M. tuberculosis were taken from studies examining rifampin resistance in clinical samples from New York City and throughout Japan. We used tree-based statistical methods and random forests to generate models of the relationships between rpoB amino acid sequence and rifampin resistance. The proportion of variance explained by a relatively simple tree-based cross-validated regression model involving two amino acid positions (526 and 531) is 0.679. The first partition in the data, based on position 531, results in groups that differ one hundredfold in mean MIC (1.596 μg/ml and 159.676 μg/ml). The subsequent partition based on position 526, the most variable in this region, results in a > 354-fold difference in MIC. When considered as a classification problem (susceptible or resistant), a cross-validated tree-based model correctly classified most (0.884) of the observations and was very similar to the regression model. Random forest analysis of the MIC data as a continuous variable, a regression problem, produced a model that explained 0.861 of the variance. The random forest analysis of the MIC data as discrete classes produced a model that correctly classified 0.942 of the observations with sensitivity of 0.958 and specificity of 0.885. CONCLUSIONS: Highly accurate regression and classification models of rifampin resistance can be made based on this short sequence region. Models may be better with improved (and consistent) measurements of MIC and more sequence data

    Workplace Injury Litigation

    Get PDF

    Sequence-Based Classification Using Discriminatory Motif Feature Selection

    Get PDF
    Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http://www.epibiostat.ucsf.edu/biostat/sen/dmfs/

    Improving Assessment of Drug Safety Through Proteomics: Early Detection and Mechanistic Characterization of the Unforeseen Harmful Effects of Torcetrapib.

    Get PDF
    BackgroundEarly detection of adverse effects of novel therapies and understanding of their mechanisms could improve the safety and efficiency of drug development. We have retrospectively applied large-scale proteomics to blood samples from ILLUMINATE (Investigation of Lipid Level Management to Understand its Impact in Atherosclerotic Events), a trial of torcetrapib (a cholesterol ester transfer protein inhibitor), that involved 15 067 participants at high cardiovascular risk. ILLUMINATE was terminated at a median of 550 days because of significant absolute increases of 1.2% in cardiovascular events and 0.4% in mortality with torcetrapib. The aims of our analysis were to determine whether a proteomic analysis might reveal biological mechanisms responsible for these harmful effects and whether harmful effects of torcetrapib could have been detected early in the ILLUMINATE trial with proteomics.MethodsA nested case-control analysis of paired plasma samples at baseline and at 3 months was performed in 249 participants assigned to torcetrapib plus atorvastatin and 223 participants assigned to atorvastatin only. Within each treatment arm, cases with events were matched to controls 1:1. Main outcomes were a survey of 1129 proteins for discovery of biological pathways altered by torcetrapib and a 9-protein risk score validated to predict myocardial infarction, stroke, heart failure, or death.ResultsPlasma concentrations of 200 proteins changed significantly with torcetrapib. Their pathway analysis revealed unexpected and widespread changes in immune and inflammatory functions, as well as changes in endocrine systems, including in aldosterone function and glycemic control. At baseline, 9-protein risk scores were similar in the 2 treatment arms and higher in participants with subsequent events. At 3 months, the absolute 9-protein derived risk increased in the torcetrapib plus atorvastatin arm compared with the atorvastatin-only arm by 1.08% (P=0.0004). Thirty-seven proteins changed in the direction of increased risk of 49 proteins previously associated with cardiovascular and mortality risk.ConclusionsHeretofore unknown effects of torcetrapib were revealed in immune and inflammatory functions. A protein-based risk score predicted harm from torcetrapib within just 3 months. A protein-based risk assessment embedded within a large proteomic survey may prove to be useful in the evaluation of therapies to prevent harm to patients.Clinical trial registrationURL: https://www.clinicaltrials.gov. Unique identifier: NCT00134264

    Regression Approaches for Microarray Data Analysis

    Get PDF
    A variety of new procedures have been devised to handle the two sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available, and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarraybased study of cardiomyopathy in transgenic mice

    Oral Direct-Acting Agent Therapy for Hepatitis C Virus Infection: A Systematic Review

    Get PDF
    Rapid improvements in hepatitis C virus (HCV) therapy have led to the approval of multiple oral direct-acting antiviral (DAA) regimens by the U.S. Food and Drug Administration (FDA) for treatment of chronic HCV infection
    • …
    corecore