5,247 research outputs found

    Cellwise Robust M Regression

    Full text link
    The cellwise robust M regression estimator is introduced as the first estimator of its kind that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against vertical outliers and leverage points. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model. The method is illustrated to be equally robust as its casewise counterpart, MM regression. The cellwise regression method discards less information than any casewise robust estimator. Therefore, predictive power can be expected to be at least as good as casewise alternatives. These results are corroborated in a simulation study. Moreover, while the simulations show that predictive performance is at least on par with casewise methods if not better, an application to a data set consisting of compositions of Swiss nutrients, shows that in individual cases, CRM can achieve a significantly higher predictive accuracy compared to MM regression

    Enhanced heterogeneity of rpoB in Mycobacterium tuberculosis found at low pH.

    No full text
    OBJECTIVES: The aim of this study was to gain an insight into the molecular mechanisms of the evolution of rifampicin resistance in response to controlled changes in the environment. METHODS: We determined the proportion of rpoB mutants in the chemostat culture and characterized the sequence of mutations found in the rifampicin resistance-determining region of rpoB in a steady-state chemostat at pH 7.0 and 6.2. RESULTS: The overall proportion of rpoB mutants of strain H37Rv remained constant for 37 days at pH 7.0, ranging between 3.6 x 10(-8) and 8.9 x 10(-8); however, the spectrum of mutations varied. The most commonly detected mutation, serine to leucine mutation at codon 531 (S531L), increased from 40% to 89%, while other mutations (S531W, H526Y, H526D, H526R, S522L and D516V) decreased over the 37 day sampling period. Changing the pH from 7.0 to 6.2 did not significantly alter the overall proportion of mutants, but resulted in a decrease in the percentage of strains harbouring S531L (from 89% to 50%) accompanied by an increase in the range of different mutations from 4 to 12. CONCLUSIONS: The data confirm that the fitness of strains with the S531L mutation is greater than that of strains containing other mutations. We also conclude that at low pH the environment is permissive for a wider spectrum of mutations, which may provide opportunities for a successful mutant to survive

    Simultaneous mapping of multiple gene loci with pooled segregants

    Get PDF
    The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases remains an important challenge. It requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms (SNPs) as genetic markers. Combining the technologies with pooling of segregants, as performed in bulked segregant analysis (BSA), should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. The gene mapping process, applied here, consists of three steps: First, a controlled crossing of parents with and without a trait. Second, selection based on phenotypic screening of the offspring, followed by the mapping of short offspring sequences against the parental reference. The final step aims at detecting genetic markers such as SNPs, insertions and deletions with next generation sequencing (NGS). Markers in close proximity of genomic loci that are associated to the trait have a higher probability to be inherited together. Hence, these markers are very useful for discovering the loci and the genetic mechanism underlying the characteristic of interest. Within this context, NGS produces binomial counts along the genome, i.e., the number of sequenced reads that matches with the SNP of the parental reference strain, which is a proxy for the number of individuals in the offspring that share the SNP with the parent. Genomic loci associated with the trait can thus be discovered by analyzing trends in the counts along the genome. We exploit the link between smoothing splines and generalized mixed models for estimating the underlying structure present in the SNP scatterplots
    • …
    corecore