50 research outputs found

    Curriculum Guidelines for Undergraduate Programs in Data Science

    Get PDF
    The Park City Math Institute 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in data science. The group consisted of 25 undergraduate faculty from a variety of institutions in the United States, primarily from the disciplines of mathematics, statistics, and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in data science

    Feature signature prediction of a boring process using neural network modeling with confidence bounds

    Full text link
    Prediction of machine tool failure has been very important in modern metal cutting operations in order to meet the growing demand for product quality and cost reduction. This paper presents the study of building a neural network model for predicting the behavior of a boring process during its full life cycle. This prediction is achieved by the fusion of the predictions of three principal components extracted as features from the joint time–frequency distributions of energy of the spindle loads observed during the boring process. Furthermore, prediction uncertainty is assessed using nonlinear regression in order to quantify the errors associated with the prediction. The results show that the implemented Elman recurrent neural network is a viable method for the prediction of the feature behavior of the boring process, and that the constructed confidence bounds provide information crucial for subsequent maintenance decision making based on the predicted cutting tool degradation.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/45845/1/170_2005_Article_114.pd

    Comprehensive analysis of correlation coefficients estimated from pooling heterogeneous microarray data

    Get PDF
    Background The synthesis of information across microarray studies has been performed by combining statistical results of individual studies (as in a mosaic), or by combining data from multiple studies into a large pool to be analyzed as a single data set (as in a melting pot of data). Specific issues relating to data heterogeneity across microarray studies, such as differences within and between labs or differences among experimental conditions, could lead to equivocal results in a melting pot approach. Results We applied statistical theory to determine the specific effect of different means and heteroskedasticity across 19 groups of microarray data on the sign and magnitude of gene-to-gene Pearson correlation coefficients obtained from the pool of 19 groups. We quantified the biases of the pooled coefficients and compared them to the biases of correlations estimated by an effect-size model. Mean differences across the 19 groups were the main factor determining the magnitude and sign of the pooled coefficients, which showed largest values of bias as they approached ±1. Only heteroskedasticity across the pool of 19 groups resulted in less efficient estimations of correlations than did a classical meta-analysis approach of combining correlation coefficients. These results were corroborated by simulation studies involving either mean differences or heteroskedasticity across a pool of N \u3e 2 groups. Conclusions The combination of statistical results is best suited for synthesizing the correlation between expression profiles of a gene pair across several microarray studies

    A High Throughput Genetic Screen Identifies New Early Meiotic Recombination Functions in Arabidopsis thaliana

    Get PDF
    Meiotic recombination is initiated by the formation of numerous DNA double-strand breaks (DSBs) catalysed by the widely conserved Spo11 protein. In Saccharomyces cerevisiae, Spo11 requires nine other proteins for meiotic DSB formation; however, unlike Spo11, few of these are conserved across kingdoms. In order to investigate this recombination step in higher eukaryotes, we took advantage of a high-throughput meiotic mutant screen carried out in the model plant Arabidopsis thaliana. A collection of 55,000 mutant lines was screened, and spo11-like mutations, characterised by a drastic decrease in chiasma formation at metaphase I associated with an absence of synapsis at prophase, were selected. This screen led to the identification of two populations of mutants classified according to their recombination defects: mutants that repair meiotic DSBs using the sister chromatid such as Atdmc1 or mutants that are unable to make DSBs like Atspo11-1. We found that in Arabidopsis thaliana at least four proteins are necessary for driving meiotic DSB repair via the homologous chromosomes. These include the previously characterised DMC1 and the Hop1-related ASY1 proteins, but also the meiotic specific cyclin SDS as well as the Hop2 Arabidopsis homologue AHP2. Analysing the mutants defective in DSB formation, we identified the previously characterised AtSPO11-1, AtSPO11-2, and AtPRD1 as well as two new genes, AtPRD2 and AtPRD3. Our data thus increase the number of proteins necessary for DSB formation in Arabidopsis thaliana to five. Unlike SPO11 and (to a minor extent) PRD1, these two new proteins are poorly conserved among species, suggesting that the DSB formation mechanism, but not its regulation, is conserved among eukaryotes

    A Novel Protein Kinase-Like Domain in a Selenoprotein, Widespread in the Tree of Life

    Get PDF
    Selenoproteins serve important functions in many organisms, usually providing essential oxidoreductase enzymatic activity, often for defense against toxic xenobiotic substances. Most eukaryotic genomes possess a small number of these proteins, usually not more than 20. Selenoproteins belong to various structural classes, often related to oxidoreductase function, yet a few of them are completely uncharacterised
    corecore