79 research outputs found

    A Study of the Drop-outs from the Watertown High School, Grades IX Through XII, During the Five School Years 1946 to 1951

    Get PDF
    The problem of encouraging pupils to remain in high school until graduation is of major importance not only to the schools but the community as well. It has been said that the public schools, and especially the secondary schools, have continued to be highly selective institutions. Adjustments within the school have not, in many cases, kept pace with changing school enrollments. Today we find a widely divergent student body necessitating wider ranges of experience to meet the needs of these individuals if we are to keep them in school until graduation. The schools, charged with the responsibility of preparing youth for citizenship and effective living in this democracy, cannot afford to allow these boys and girls to leave school before they have had the essential minimum of training provided in the twelve years of formal education. During the several years of teaching in the Watertown Public Schools, the writer has observed the departure of a number of pupils prior to graduation. As an adviser to many of these pupils, the writer gave serious thought to the problem of keeping these pupils in school. Realizing the need for a more thorough knowledge of this problem, the writer conferred with D.W. Tieszen, the high-school principal, concerning the scope and procedures for the study. It was decided that a study of all accountable pupils withdrawing from school, yet remaining in the community for a- period of at least six months, during the last five years period, September 1946 to.June 1951, would give sufficient data for the study. The study includes all pupils dropping school during the ninth, tenth, eleventh, and twelfth grades as well as those pupils remaining to the end of the school year but failing to register at the beginning of the next school term. The study does not include any pupils dropping prior to registration at the ninth-grade level nor those pupils withdrawing from the parochial school within the city

    Refinement algebra for probabilistic programs

    Get PDF
    We identify a refinement algebra for reasoning about probabilistic program transformations in a total-correctness setting. The algebra is equipped with operators that determine whether a program is enabled or terminates respectively. As well as developing the basic theory of the algebra we demonstrate how it may be used to explain key differences and similarities between standard (i.e. non-probabilistic) and probabilistic programs and verify important transformation theorems for probabilistic action systems.29 page(s

    A robust calibration of the clumped isotopes to temperature relationship for foraminifers

    Get PDF
    The clumped isotope (Δ47) proxy is a promising geochemical tool to reconstruct past ocean temperatures far back in time and in unknown settings, due to its unique thermodynamic basis that renders it independent from other environmental factors like seawater composition. Although previously hampered by large sample-size requirements, recent methodological advances have made the paleoceanographic application of Δ47 on small (<5 mg) foraminifer samples possible. Previous studies show a reasonable match between Δ47 calibrations based on synthetic carbonate and various species of planktonic foraminifers. However, studies performed before recent methodological advances were based on relatively few species and data treatment that is now outdated. To overcome these limitations and elucidate species-specific effects, we analyzed 14 species of planktonic foraminifers in sediment surface samples from 13 sites, covering a growth temperature range of ∼0–28 °C. We selected mixed layer-dwelling and deep-dwelling species from a wide range of ocean settings to evaluate the feasibility of temperature reconstructions for different water depths. Various techniques to estimate foraminifer calcification temperatures were tested in order to assess their effects on the calibration and to find the most suitable approach. Results from this study generally confirm previous findings that there are no species-specific effects on the Δ47-temperature relationship in planktonic foraminifers, with one possible exception. Various morphotypes of Globigerinoides ruber were found to often deviate from the general trend determined for planktonic foraminifers. Our data are in excellent agreement with a recent foraminifer calibration study that was performed with a different analytical setup, as well as with a calibration based exclusively on benthic foraminifers. A combined, methodologically homogenized dataset also reveals very good agreement with an inorganic calibration based on travertines. Our findings highlight the potential of the Δ47 paleothermometer to be applied to recent and extinct species alike to study surface ocean temperatures as well as thermocline variability for a multitude of settings and time scales

    Word correlation matrices for protein sequence analysis and remote homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive.</p> <p>Results</p> <p>In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection.</p> <p>Conclusion</p> <p>Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

    Learning a peptide-protein binding affinity predictor with kernel ridge regression

    Get PDF
    We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalize eight kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of accurately predicting the binding affinity of any peptide to any protein. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. On all benchmarks, our method significantly (p-value < 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. The method should be of value to a large segment of the research community with the potential to accelerate peptide-based drug and vaccine development.Comment: 22 pages, 4 figures, 5 table

    P-value based visualization of codon usage data

    Get PDF
    Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases

    Gene prediction in metagenomic fragments: A large scale machine learning approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions.</p> <p>Results</p> <p>We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability.</p> <p>Conclusion</p> <p>Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p
    • …
    corecore