101,803 research outputs found
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
This paper introduces new algorithms and data structures for quick counting
for machine learning datasets. We focus on the counting task of constructing
contingency tables, but our approach is also applicable to counting the number
of records in a dataset that match conjunctive queries. Subject to certain
assumptions, the costs of these operations can be shown to be independent of
the number of records in the dataset and loglinear in the number of non-zero
entries in the contingency table. We provide a very sparse data structure, the
ADtree, to minimize memory use. We provide analytical worst-case bounds for
this structure for several models of data distribution. We empirically
demonstrate that tractably-sized data structures can be produced for large
real-world datasets by (a) using a sparse tree structure that never allocates
memory for counts of zero, (b) never allocating memory for counts that can be
deduced from other counts, and (c) not bothering to expand the tree fully near
its leaves. We show how the ADtree can be used to accelerate Bayes net
structure finding algorithms, rule learning algorithms, and feature selection
algorithms, and we provide a number of empirical results comparing ADtree
methods against traditional direct counting approaches. We also discuss the
possible uses of ADtrees in other machine learning methods, and discuss the
merits of ADtrees in comparison with alternative representations such as
kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file
A Search for Correlations between Gamma-Ray Burst Variability and Afterglow Onset
We compared the time (or time limit) of onset for optical afterglow emission
to the gamma-ray variability V in 76 GRBs with redshifts. In the subset (25
cases) with the rise evident in the data, we fit the shape of the onset peak as
well and compared the rising and decaying indices to V. We did not find any
evidence for any patterns between these properties and there is no statistical
support for any correlations. This indicates a lack of connection between
irregularities of the prompt gamma-ray emission and the establishment of the
afterglow phase. In the ordinary prompt internal shocks interpretation, this
would indicate a lack of relationship between V and the bulk Lorentz factor of
the event.Comment: 14 pages including 8 figures, MNRAS accepte
Recommended from our members
Tracking surface photovoltage dipole geometry in bi2se3 with time-resolved photoemission
Topological insulators have been shown to exhibit strong and long-lived surface photovoltages when excited by an infrared pump. The ability to generate long-lived potentials on these surfaces provides opportunities to manipulate the spin-momentum locked topological surface states. Moreover, the photo-induced nature of this effect allows for localized excitation of arbitrary geometries. Knowing precisely how these potentials form and evolve is critical in understanding how to manage the effect in applications. The uniqueness of the photoemission experimental geometry, in which the photoelectron must traverse the induced surface field in vacuum, provides an interesting probe of the electric dipole shape generated by the surface photovoltage. In this study, we are able to match the observed decay of the geometric effect on the photoelectron to an essential electrodynamics model of the light-induced dipole thereby tracking the fluence-dependent evolution of the dipole geometry. By utilizing a standard time-resolved angle-resolved photoemission experiment, we are able to determine real-space information of the dipole while simultaneously recovering time-resolved band structure
Recognising Desire: A psychosocial approach to understanding education policy implementation and effect
It is argued that in order to understand the ways in which teachers experience their work - including the idiosyncratic ways in which they respond to and implement mandated education policy - it is necessary to take account both of sociological and of psychological issues. The paper draws on original research with practising and beginning teachers, and on theories of social and psychic induction, to illustrate the potential benefits of this bipartisan approach for both teachers and researchers. Recognising the significance of (but somewhat arbitrary distinction between) structure and agency in teachers’ practical and ideological positionings, it is suggested that teachers’ responses to local and central policy changes are governed by a mix of pragmatism, social determinism and often hidden desires. It is the often underacknowledged strength of desire that may tip teachers into accepting and implementing policies with which they are not ideologically comfortable
Tension is Dimension
We propose a simple universal formula for the tension of a D-brane in terms
of a regularized dimension of the associated conformal field theory statespace.Comment: 18 pages, harvmac (b), one ref added, one typo fixe
Comment on `Pressure of Hot QCD at large N_f'
It is argued why quasiparticle models can be useful to describe the
thermodynamics of hot QCD excluding, however, the case of a large number of
flavors, for which exact results have been calculated by Moore.Comment: 5 pages, 2 figures (version accepted for publication
A novel approach to study realistic navigations on networks
We consider navigation or search schemes on networks which are realistic in
the sense that not all search chains can be completed. We show that the
quantity , where is the average dynamic shortest distance
and the success rate of completion of a search, is a consistent measure
for the quality of a search strategy. Taking the example of realistic searches
on scale-free networks, we find that scales with the system size as
, where decreases as the searching strategy is improved.
This measure is also shown to be sensitive to the distintinguishing
characteristics of networks. In this new approach, a dynamic small world (DSW)
effect is said to exist when . We show that such a DSW indeed
exists in social networks in which the linking probability is dependent on
social distances.Comment: Text revised, references added; accepted version in Journal of
Statistical Mechanic
Limits on Lorentz Violation from the Highest Energy Cosmic Rays
We place several new limits on Lorentz violating effects, which can modify
particles' dispersion relations, by considering the highest energy cosmic rays
observed. Since these are hadrons, this involves considering the partonic
content of such cosmic rays. We get a number of bounds on differences in
maximum propagation speeds, which are typically bounded at the 10^{-21} level,
and on momentum dependent dispersion corrections of the form v = 1 +-
p^2/Lambda^2, which typically bound Lambda > 10^{21} GeV, well above the Planck
scale. For (CPT violating) dispersion correction of the form v = 1 + p/Lambda,
the bounds are up to 15 orders of magnitude beyond the Planck scale.Comment: 24 pages, no figures. Added references, very slight changes. Version
published in Physical Review
Genomic dissection of the 1994 Cronobacter sakazakii outbreak in a French neonatal intensive care unit
Background: Cronobacter sakazakii is a member of the genus Cronobacter that has frequently been isolated from powdered infant formula (PIF) and linked with rare but fatal neonatal infections such as meningitis and necrotising enterocolitis. The Cronobacter MLST scheme has reported over 400 sequence types and 42 clonal complexes; however C. sakazakii clonal complex 4 (CC4) has been linked strongly with neonatal infections, especially meningitis. There have been a number of reported Cronobacter outbreaks over the last three decades. The largest outbreak of C. sakazakii was in a neonatal intensive care unit (NICU) in France (1994) that lasted over 3 months and claimed the lives of three neonates. The present study used whole genome sequencing data of 26 isolates obtained from this outbreak to reveal their relatedness. This study is first of its kind to use whole genome sequencing data to analyse a Cronobacter outbreak. Methods: Whole genome sequencing data was generated for 26 C. sakazakii isolates on the Illumina MiSeq platform. The whole genome phylogeny was determined using Mugsy and RaxML. SNP calls were determined using SMALT and SAMtools, and filtered using VCFtools. Results: The whole genome phylogeny suggested 3 distant clusters of C. sakazakii isolates were associated with the outbreak. SNP typing and phylogeny indicate the source of the C. sakazakii could have been from extrinsic contamination of reconstituted infant formula from the NICU environment and personnel. This pool of strains would have contributed to the prolonged duration of the outbreak, which was up to 3 months. Furthermore 3 neonates were co-infected with C. sakazakii from two different genotype clusters. Conclusion: The genomic investigation revealed the outbreak consisted of an heterogeneous population of C. sakazakii isolates. The source of the outbreak was not identified, but probably was due to environmental and personnel reservoirs resulting in extrinsic contamination of the neonatal feeds. It also indicated that C. sakazakii isolates from different genotype clusters have the ability to co-infect neonates
- …
