101,803 research outputs found

    Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

    Full text link
    This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file

    A Search for Correlations between Gamma-Ray Burst Variability and Afterglow Onset

    Full text link
    We compared the time (or time limit) of onset for optical afterglow emission to the gamma-ray variability V in 76 GRBs with redshifts. In the subset (25 cases) with the rise evident in the data, we fit the shape of the onset peak as well and compared the rising and decaying indices to V. We did not find any evidence for any patterns between these properties and there is no statistical support for any correlations. This indicates a lack of connection between irregularities of the prompt gamma-ray emission and the establishment of the afterglow phase. In the ordinary prompt internal shocks interpretation, this would indicate a lack of relationship between V and the bulk Lorentz factor of the event.Comment: 14 pages including 8 figures, MNRAS accepte

    Recognising Desire: A psychosocial approach to understanding education policy implementation and effect

    Get PDF
    It is argued that in order to understand the ways in which teachers experience their work - including the idiosyncratic ways in which they respond to and implement mandated education policy - it is necessary to take account both of sociological and of psychological issues. The paper draws on original research with practising and beginning teachers, and on theories of social and psychic induction, to illustrate the potential benefits of this bipartisan approach for both teachers and researchers. Recognising the significance of (but somewhat arbitrary distinction between) structure and agency in teachers’ practical and ideological positionings, it is suggested that teachers’ responses to local and central policy changes are governed by a mix of pragmatism, social determinism and often hidden desires. It is the often underacknowledged strength of desire that may tip teachers into accepting and implementing policies with which they are not ideologically comfortable

    Tension is Dimension

    Get PDF
    We propose a simple universal formula for the tension of a D-brane in terms of a regularized dimension of the associated conformal field theory statespace.Comment: 18 pages, harvmac (b), one ref added, one typo fixe

    Comment on `Pressure of Hot QCD at large N_f'

    Full text link
    It is argued why quasiparticle models can be useful to describe the thermodynamics of hot QCD excluding, however, the case of a large number of flavors, for which exact results have been calculated by Moore.Comment: 5 pages, 2 figures (version accepted for publication

    A novel approach to study realistic navigations on networks

    Get PDF
    We consider navigation or search schemes on networks which are realistic in the sense that not all search chains can be completed. We show that the quantity μ=ρ/sd\mu = \rho/s_d, where sds_d is the average dynamic shortest distance and ρ\rho the success rate of completion of a search, is a consistent measure for the quality of a search strategy. Taking the example of realistic searches on scale-free networks, we find that μ\mu scales with the system size NN as NδN^{-\delta}, where δ\delta decreases as the searching strategy is improved. This measure is also shown to be sensitive to the distintinguishing characteristics of networks. In this new approach, a dynamic small world (DSW) effect is said to exist when δ0\delta \approx 0. We show that such a DSW indeed exists in social networks in which the linking probability is dependent on social distances.Comment: Text revised, references added; accepted version in Journal of Statistical Mechanic

    Limits on Lorentz Violation from the Highest Energy Cosmic Rays

    Full text link
    We place several new limits on Lorentz violating effects, which can modify particles' dispersion relations, by considering the highest energy cosmic rays observed. Since these are hadrons, this involves considering the partonic content of such cosmic rays. We get a number of bounds on differences in maximum propagation speeds, which are typically bounded at the 10^{-21} level, and on momentum dependent dispersion corrections of the form v = 1 +- p^2/Lambda^2, which typically bound Lambda > 10^{21} GeV, well above the Planck scale. For (CPT violating) dispersion correction of the form v = 1 + p/Lambda, the bounds are up to 15 orders of magnitude beyond the Planck scale.Comment: 24 pages, no figures. Added references, very slight changes. Version published in Physical Review

    Genomic dissection of the 1994 Cronobacter sakazakii outbreak in a French neonatal intensive care unit

    Get PDF
    Background: Cronobacter sakazakii is a member of the genus Cronobacter that has frequently been isolated from powdered infant formula (PIF) and linked with rare but fatal neonatal infections such as meningitis and necrotising enterocolitis. The Cronobacter MLST scheme has reported over 400 sequence types and 42 clonal complexes; however C. sakazakii clonal complex 4 (CC4) has been linked strongly with neonatal infections, especially meningitis. There have been a number of reported Cronobacter outbreaks over the last three decades. The largest outbreak of C. sakazakii was in a neonatal intensive care unit (NICU) in France (1994) that lasted over 3 months and claimed the lives of three neonates. The present study used whole genome sequencing data of 26 isolates obtained from this outbreak to reveal their relatedness. This study is first of its kind to use whole genome sequencing data to analyse a Cronobacter outbreak. Methods: Whole genome sequencing data was generated for 26 C. sakazakii isolates on the Illumina MiSeq platform. The whole genome phylogeny was determined using Mugsy and RaxML. SNP calls were determined using SMALT and SAMtools, and filtered using VCFtools. Results: The whole genome phylogeny suggested 3 distant clusters of C. sakazakii isolates were associated with the outbreak. SNP typing and phylogeny indicate the source of the C. sakazakii could have been from extrinsic contamination of reconstituted infant formula from the NICU environment and personnel. This pool of strains would have contributed to the prolonged duration of the outbreak, which was up to 3 months. Furthermore 3 neonates were co-infected with C. sakazakii from two different genotype clusters. Conclusion: The genomic investigation revealed the outbreak consisted of an heterogeneous population of C. sakazakii isolates. The source of the outbreak was not identified, but probably was due to environmental and personnel reservoirs resulting in extrinsic contamination of the neonatal feeds. It also indicated that C. sakazakii isolates from different genotype clusters have the ability to co-infect neonates
    corecore