164,599 research outputs found

    The Impact of Measurement Error in Regression Models Using Police Recorded Crime Rates

    Get PDF
    Objectives Assess the extent to which measurement error in police recorded crime rates impact the estimates of regression models exploring the causes and consequences of crime. Methods We focus on linear models where crime rates are included either as the response or as an explanatory variable, in their original scale or log-transformed. Two measurement error mechanisms are considered, systematic errors in the form of under-recorded crime, and random errors in the form of recording inconsistencies across areas. The extent to which such measurement error mechanisms impact model parameters is demonstrated algebraically using formal notation, and graphically using simulations. Results The impact of measurement error is highly variable across different settings. Depending on the crime type, the spatial resolution, but also where and how police recorded crime rates are introduced in the model, the measurement error induced biases could range from negligible to severe, affecting even estimates from explanatory variables free of measurement error. We also demonstrate how in models where crime rates are introduced as the response variable, the impact of measurement error could be eliminated using log-transformations. Conclusions The validity of a large share of the evidence base exploring the effects and consequences of crime is put into question. In interpreting findings from the literature relying on regression models and police recorded crime rates, we urge researchers to consider the biasing effects shown here. Future studies should also anticipate the impact in their findings and employ sensitivity analysis if the expected measurement error induced bias is non-negligible

    Entanglement of two-mode Gaussian states: characterization and experimental production and manipulation

    Get PDF
    A powerful theoretical structure has emerged in recent years on the characterization and quantification of entanglement in continuous-variable systems. After reviewing this framework, we will illustrate it with an original set-up based on a type-II OPO with adjustable mode coupling. Experimental results allow a direct verification of many theoretical predictions and provide a sharp insight into the general properties of two-mode Gaussian states and entanglement resource manipulation

    SDSS Standard Star Catalog for Stripe 82: the Dawn of Industrial 1% Optical Photometry

    Get PDF
    We describe a standard star catalog constructed using multiple SDSS photometric observations (at least four per band, with a median of ten) in the ugrizugriz system. The catalog includes 1.01 million non-variable unresolved objects from the equatorial stripe 82 (δJ2000<|\delta_{J2000}|< 1.266^\circ) in the RA range 20h 34m to 4h 00m, and with the corresponding rr band (approximately Johnson V band) magnitudes in the range 14--22. The distributions of measurements for individual sources demonstrate that the photometric pipeline correctly estimates random photometric errors, which are below 0.01 mag for stars brighter than (19.5, 20.5, 20.5, 20, 18.5) in ugrizugriz, respectively (about twice as good as for individual SDSS runs). Several independent tests of the internal consistency suggest that the spatial variation of photometric zeropoints is not larger than \sim0.01 mag (rms). In addition to being the largest available dataset with optical photometry internally consistent at the \sim1% level, this catalog provides practical definition of the SDSS photometric system. Using this catalog, we show that photometric zeropoints for SDSS observing runs can be calibrated within nominal uncertainty of 2% even for data obtained through 1 mag thick clouds, and demonstrate the existence of He and H white dwarf sequences using photometric data alone. Based on the properties of this catalog, we conclude that upcoming large-scale optical surveys such as the Large Synoptic Survey Telescope will be capable of delivering robust 1% photometry for billions of sources.Comment: 63 pages, 24 figures, submitted to AJ, version with correct figures and catalog available from http://www.astro.washington.edu/ivezic/sdss/catalogs/stripe82.htm

    HoloDetect: Few-Shot Learning for Error Detection

    Full text link
    We introduce a few-shot learning framework for error detection. We show that data augmentation (a form of weak supervision) is key to training high-quality, ML-based error detection models that require minimal human involvement. Our framework consists of two parts: (1) an expressive model to learn rich representations that capture the inherent syntactic and semantic heterogeneity of errors; and (2) a data augmentation model that, given a small seed of clean records, uses dataset-specific transformations to automatically generate additional training data. Our key insight is to learn data augmentation policies from the noisy input dataset in a weakly supervised manner. We show that our framework detects errors with an average precision of ~94% and an average recall of ~93% across a diverse array of datasets that exhibit different types and amounts of errors. We compare our approach to a comprehensive collection of error detection methods, ranging from traditional rule-based methods to ensemble-based and active learning approaches. We show that data augmentation yields an average improvement of 20 F1 points while it requires access to 3x fewer labeled examples compared to other ML approaches.Comment: 18 pages

    Multiple Approaches to Absenteeism Analysis

    Get PDF
    Absenteeism research has often been criticized for using inappropriate analysis. Characteristics of absence data, notably that it is usually truncated and skewed, violate assumptions of OLS regression; however, OLS and correlation analysis remain the dominant models of absenteeism research. This piece compares eight models that may be appropriate for analyzing absence data. Specifically, this piece discusses and uses OLS regression, OLS regression with a transformed dependent variable, the Tobit model, Poisson regression, Overdispersed Poisson regression, the Negative Binomial model, Ordinal Logistic regression, and the Ordinal Probit model. A simulation methodology is employed to determine the extent to which each model is likely to produce false positives. Simulations vary with respect to the shape of the dependent variable\u27s distribution, sample size, and the shape of the independent variables\u27 distributions. Actual data,based on a sample of 195 manufacturing employees, is used to illustrate how these models might be used to analyze a real data set. Results from the simulation suggest that, despite methodological expectations, OLS regression does not produce significantly more false positives than expected at various alpha levels. However, the Tobit and Poisson models are often shown to yield too many false positives. A number of other models yield less than the expected number of false positives, thus suggesting that they may serve well as conservative hypothesis tests
    corecore