164,599 research outputs found
The Impact of Measurement Error in Regression Models Using Police Recorded Crime Rates
Objectives
Assess the extent to which measurement error in police recorded crime rates impact the estimates of regression models exploring the causes and consequences of crime.
Methods
We focus on linear models where crime rates are included either as the response or as an explanatory variable, in their original scale or log-transformed. Two measurement error mechanisms are considered, systematic errors in the form of under-recorded crime, and random errors in the form of recording inconsistencies across areas. The extent to which such measurement error mechanisms impact model parameters is demonstrated algebraically using formal notation, and graphically using simulations.
Results
The impact of measurement error is highly variable across different settings. Depending on the crime type, the spatial resolution, but also where and how police recorded crime rates are introduced in the model, the measurement error induced biases could range from negligible to severe, affecting even estimates from explanatory variables free of measurement error. We also demonstrate how in models where crime rates are introduced as the response variable, the impact of measurement error could be eliminated using log-transformations.
Conclusions
The validity of a large share of the evidence base exploring the effects and consequences of crime is put into question. In interpreting findings from the literature relying on regression models and police recorded crime rates, we urge researchers to consider the biasing effects shown here. Future studies should also anticipate the impact in their findings and employ sensitivity analysis if the expected measurement error induced bias is non-negligible
Entanglement of two-mode Gaussian states: characterization and experimental production and manipulation
A powerful theoretical structure has emerged in recent years on the
characterization and quantification of entanglement in continuous-variable
systems. After reviewing this framework, we will illustrate it with an original
set-up based on a type-II OPO with adjustable mode coupling. Experimental
results allow a direct verification of many theoretical predictions and provide
a sharp insight into the general properties of two-mode Gaussian states and
entanglement resource manipulation
SDSS Standard Star Catalog for Stripe 82: the Dawn of Industrial 1% Optical Photometry
We describe a standard star catalog constructed using multiple SDSS
photometric observations (at least four per band, with a median of ten) in the
system. The catalog includes 1.01 million non-variable unresolved
objects from the equatorial stripe 82 ( 1.266) in
the RA range 20h 34m to 4h 00m, and with the corresponding band
(approximately Johnson V band) magnitudes in the range 14--22. The
distributions of measurements for individual sources demonstrate that the
photometric pipeline correctly estimates random photometric errors, which are
below 0.01 mag for stars brighter than (19.5, 20.5, 20.5, 20, 18.5) in ,
respectively (about twice as good as for individual SDSS runs). Several
independent tests of the internal consistency suggest that the spatial
variation of photometric zeropoints is not larger than 0.01 mag (rms). In
addition to being the largest available dataset with optical photometry
internally consistent at the 1% level, this catalog provides practical
definition of the SDSS photometric system. Using this catalog, we show that
photometric zeropoints for SDSS observing runs can be calibrated within nominal
uncertainty of 2% even for data obtained through 1 mag thick clouds, and
demonstrate the existence of He and H white dwarf sequences using photometric
data alone. Based on the properties of this catalog, we conclude that upcoming
large-scale optical surveys such as the Large Synoptic Survey Telescope will be
capable of delivering robust 1% photometry for billions of sources.Comment: 63 pages, 24 figures, submitted to AJ, version with correct figures
and catalog available from
http://www.astro.washington.edu/ivezic/sdss/catalogs/stripe82.htm
HoloDetect: Few-Shot Learning for Error Detection
We introduce a few-shot learning framework for error detection. We show that
data augmentation (a form of weak supervision) is key to training high-quality,
ML-based error detection models that require minimal human involvement. Our
framework consists of two parts: (1) an expressive model to learn rich
representations that capture the inherent syntactic and semantic heterogeneity
of errors; and (2) a data augmentation model that, given a small seed of clean
records, uses dataset-specific transformations to automatically generate
additional training data. Our key insight is to learn data augmentation
policies from the noisy input dataset in a weakly supervised manner. We show
that our framework detects errors with an average precision of ~94% and an
average recall of ~93% across a diverse array of datasets that exhibit
different types and amounts of errors. We compare our approach to a
comprehensive collection of error detection methods, ranging from traditional
rule-based methods to ensemble-based and active learning approaches. We show
that data augmentation yields an average improvement of 20 F1 points while it
requires access to 3x fewer labeled examples compared to other ML approaches.Comment: 18 pages
Multiple Approaches to Absenteeism Analysis
Absenteeism research has often been criticized for using inappropriate analysis. Characteristics of absence data, notably that it is usually truncated and skewed, violate assumptions of OLS regression; however, OLS and correlation analysis remain the dominant models of absenteeism research. This piece compares eight models that may be appropriate for analyzing absence data. Specifically, this piece discusses and uses OLS regression, OLS regression with a transformed dependent variable, the Tobit model, Poisson regression, Overdispersed Poisson regression, the Negative Binomial model, Ordinal Logistic regression, and the Ordinal Probit model. A simulation methodology is employed to determine the extent to which each model is likely to produce false positives. Simulations vary with respect to the shape of the dependent variable\u27s distribution, sample size, and the shape of the independent variables\u27 distributions. Actual data,based on a sample of 195 manufacturing employees, is used to illustrate how these models might be used to analyze a real data set. Results from the simulation suggest that, despite methodological expectations, OLS regression does not produce significantly more false positives than expected at various alpha levels. However, the Tobit and Poisson models are often shown to yield too many false positives. A number of other models yield less than the expected number of false positives, thus suggesting that they may serve well as conservative hypothesis tests
- …