37,345 research outputs found

    Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

    Full text link
    For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a budget-sensitive progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance

    Through the Eye of the Beholder: Multiple Perspectives on Quality in Women\u27s Health Care

    Get PDF
    Quality is an illusive concept with different meanings to different people. Providers often define quality in terms of patient outcomes, professional standards of practice, predetermined criteria used to measure quality, and even subjective opinion. Patients describe quality in terms of the interpersonal aspects of care, how well they were treated, and the responsiveness of the provider to their needs. This qualitative study using a semi-structured interview defined quality from the perspectives of patients, physicians, nurses, and payers associated with a hospital-based women\u27s service line, and how the attributes of quality varied among the multiple groups. The study also described how stakeholders become aware of quality and how they determined a hospital\u27s quality. From the findings of the study, a conceptual framework of quality in women\u27s health was developed

    Improved electro-optical tracking system

    Get PDF
    Electro-optical tracking system employs a laser beam illuminating source, an electronic laser beam deflector, and an image dissector photomultiplier. An electronic scanning transmitter and receiver follows rapid movements or accelerations of the target

    Modelling CO emission from hydrodynamic simulations of nearby spirals, starbursting mergers, and high-redshift galaxies

    Get PDF
    We model the intensity of emission lines from the CO molecule, based on hydrodynamic simulations of spirals, mergers, and high-redshift galaxies with very high resolutions (3pc and 10^3 Msun) and detailed models for the phase-space structure of the interstellar gas including shock heating, stellar feedback processes and galactic winds. The simulations are analyzed with a Large Velocity Gradient (LVG) model to compute the local emission in various molecular lines in each resolution element, radiation transfer and opacity effects, and the intensity emerging from galaxies, to generate synthetic spectra for various transitions of the CO molecule. This model reproduces the known properties of CO spectra and CO-to-H2 conversion factors in nearby spirals and starbursting major mergers. The high excitation of CO lines in mergers is dominated by an excess of high-density gas, and the high turbulent velocities and compression that create this dense gas excess result in broad linewidths and low CO intensity-to-H2 mass ratios. When applied to high-redshift gas-rich disks galaxies, the same model predicts that their CO-to-H2 conversion factor is almost as high as in nearby spirals, and much higher than in starbursting mergers. High-redshift disk galaxies contain giant star-forming clumps that host a high-excitation component associated to gas warmed by the spatially-concentrated stellar feedback sources, although CO(1-0) to CO(3-2) emission is overall dominated by low-excitation gas around the densest clumps. These results overall highlight a strong dependence of CO excitation and the CO-to-H2 conversion factor on galaxy type, even at similar star formation rates or densities. The underlying processes are driven by the interstellar medium structure and turbulence and its response to stellar feedback, which depend on global galaxy structure and in turn impact the CO emission properties.Comment: A&A in pres

    Meteorite cloudy zone formation as a quantitative indicator of paleomagnetic field intensities and cooling rates on planetesimals

    Full text link
    Metallic microstructures in slowly-cooled iron-rich meteorites reflect the thermal and magnetic histories of their parent planetesimals. Of particular interest is the cloudy zone, a nanoscale intergrowth of Ni-rich islands within a Ni-poor matrix that forms below 350{\deg}C by spinodal decomposition. The sizes of the islands have long been recognized as reflecting the low-temperature cooling rates of meteorite parent bodies. However, a model capable of providing quantitative cooling rate estimates from island sizes has been lacking. Moreover, these islands are also capable of preserving a record of the ambient magnetic field as they grew, but some of the key physical parameters required for recovering reliable paleointensity estimates from magnetic measurements of these islands have been poorly constrained. To address both of these issues, we present a numerical model of the structural and compositional evolution of the cloudy zone as a function of cooling rate and local composition. Our model produces island sizes that are consistent with present-day measured sizes. This model enables a substantial improvement in the calibration of paleointensity estimates and associated uncertainties. In particular, we can now accurately quantify the statistical uncertainty associated with the finite number of islands and the uncertainty on their size at the time of the record. We use this new understanding to revisit paleointensities from previous pioneering paleomagnetic studies of cloudy zones. We show that these could have been overestimated but nevertheless still require substantial magnetic fields to have been present on their parent bodies. Our model also allows us to estimate absolute cooling rates for meteorites that cooled slower than 10000{\deg}C My-1. We demonstrate how these cooling rate estimates can uniquely constrain the low-temperature thermal history of meteorite parent bodies.Comment: Manuscript resubmitted after revision

    Thermodynamics of a subensemble of a canonical ensemble

    Full text link
    Two approaches to describe the thermodynamics of a subsystem that interacts with a thermal bath are considered. Within the first approach, the mean system energy ESE_{S} is identified with the expectation value of the system Hamiltonian, which is evaluated with respect to the overall (system+bath) equilibrium distribution. Within the second approach, the system partition function ZSZ_{S} is considered as the fundamental quantity, which is postulated to be the ratio of the overall (system+bath) and the bath partition functions, and the standard thermodynamic relation ES=d(lnZS)/dβE_{S}=-d(\ln Z_{S})/d\beta is used to obtain the mean system energy. % (β1/(kBT)\beta\equiv 1/(k_{B}T), kBk_{B} is the Boltzmann constant, %and TT is the temperature). Employing both classical and quantum mechanical treatments, the advantages and shortcomings of the two approaches are analyzed in detail for various different systems. It is shown that already within classical mechanics both approaches predict significantly different results for thermodynamic quantities provided the system-bath interaction is not bilinear or the system of interest consists of more than a single particle. Based on the results, it is concluded that the first approach is superior

    Paraunitary oversampled filter bank design for channel coding

    Get PDF
    Oversampled filter banks (OSFBs) have been considered for channel coding, since their redundancy can be utilised to permit the detection and correction of channel errors. In this paper, we propose an OSFB-based channel coder for a correlated additive Gaussian noise channel, of which the noise covariance matrix is assumed to be known. Based on a suitable factorisation of this matrix, we develop a design for the decoder's synthesis filter bank in order to minimise the noise power in the decoded signal, subject to admitting perfect reconstruction through paraunitarity of the filter bank. We demonstrate that this approach can lead to a significant reduction of the noise interference by exploiting both the correlation of the channel and the redundancy of the filter banks. Simulation results providing some insight into these mechanisms are provided

    Statistics of statisticians: Critical mass of statistics and operational research groups in the UK

    Get PDF
    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the quality of statistics and operational research groups and the quantity researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ±\pm 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is about twice this value
    corecore