Search CORE

164,599 research outputs found

The Impact of Measurement Error in Regression Models Using Police Recorded Crime Rates

Author: Brunton-Smith Ian
Buil-Gil David
Cernat Alexandru
Pina-Sánchez Jose
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2022
Field of study

Objectives Assess the extent to which measurement error in police recorded crime rates impact the estimates of regression models exploring the causes and consequences of crime. Methods We focus on linear models where crime rates are included either as the response or as an explanatory variable, in their original scale or log-transformed. Two measurement error mechanisms are considered, systematic errors in the form of under-recorded crime, and random errors in the form of recording inconsistencies across areas. The extent to which such measurement error mechanisms impact model parameters is demonstrated algebraically using formal notation, and graphically using simulations. Results The impact of measurement error is highly variable across different settings. Depending on the crime type, the spatial resolution, but also where and how police recorded crime rates are introduced in the model, the measurement error induced biases could range from negligible to severe, affecting even estimates from explanatory variables free of measurement error. We also demonstrate how in models where crime rates are introduced as the response variable, the impact of measurement error could be eliminated using log-transformations. Conclusions The validity of a large share of the evidence base exploring the effects and consequences of crime is put into question. In interpreting findings from the literature relying on regression models and police recorded crime rates, we urge researchers to consider the biasing effects shown here. Future studies should also anticipate the impact in their findings and employ sensitivity analysis if the expected measurement error induced bias is non-negligible

The University of Manchester - Institutional Repository

White Rose Research Online

Entanglement of two-mode Gaussian states: characterization and experimental production and manipulation

Author: Adesso G Illuminati F
Alessio Serafini
Arvind
Barnett A M
Braunstein S L
Braunstein S L van Loock P
Cerf N
Claude Fabre
Eisert J
Fabrizio Illuminati
Gaëlle Keller
Gerardo Adesso
Hayden P M
José Augusto Oliveira-Huguenin
Julien Laurat
Mason E J
Plenio M B
Serafini A
Takei N
Thomas Coudreau
Villar A S Cruz L S Cassemiro K N Martinelli M Nussenzveig P
Wenger J
Williamson J
Yuen H P
Publication venue: 'IOP Publishing'
Publication date: 01/01/2005
Field of study

A powerful theoretical structure has emerged in recent years on the characterization and quantification of entanglement in continuous-variable systems. After reviewing this framework, we will illustrate it with an original set-up based on a type-II OPO with adjustable mode coupling. Experimental results allow a direct verification of many theoretical predictions and provide a sharp insight into the general properties of two-mode Gaussian states and entanglement resource manipulation

arXiv.org e-Print Archive

CiteSeerX

Crossref

UCL Discovery

Archivio della Ricerca - Università di Salerno

CERN Document Server

Hal-Diderot

SDSS Standard Star Catalog for Stripe 82: the Dawn of Industrial 1% Optical Photometry

Author: Allyn Smith J.
Bond Nicholas
Brinkmann Jon
Doi Mamoru
Finkbeiner Douglas
Fukugita Masataka
Gunn James E.
Harding Paul
Harris Hugh
Holtzman Jon
Ivezic Zeljko
Jester Sebastian
Juric Mario
Kent Steve
Knapp Gillian R.
Lee Brian
Lin Huan
Lupton Robert H.
Miknaitis Gajus
Morrison Heather
Padmanabhan Nikhil
Rockosi Constance M.
Schlegel David
Schneider Donald P.
Sesar Branimir
Stoughton Chris
Strauss Michael A.
Tanaka Masayuki
Tucker Douglas
Yanny Brian
York Donald
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2007
Field of study

We describe a standard star catalog constructed using multiple SDSS photometric observations (at least four per band, with a median of ten) in the

ugriz

system. The catalog includes 1.01 million non-variable unresolved objects from the equatorial stripe 82 (

|\delta_{J2000}|<

1.266

^\circ

) in the RA range 20h 34m to 4h 00m, and with the corresponding

r

band (approximately Johnson V band) magnitudes in the range 14--22. The distributions of measurements for individual sources demonstrate that the photometric pipeline correctly estimates random photometric errors, which are below 0.01 mag for stars brighter than (19.5, 20.5, 20.5, 20, 18.5) in

ugriz

, respectively (about twice as good as for individual SDSS runs). Several independent tests of the internal consistency suggest that the spatial variation of photometric zeropoints is not larger than

\sim

0.01 mag (rms). In addition to being the largest available dataset with optical photometry internally consistent at the

\sim

1% level, this catalog provides practical definition of the SDSS photometric system. Using this catalog, we show that photometric zeropoints for SDSS observing runs can be calibrated within nominal uncertainty of 2% even for data obtained through 1 mag thick clouds, and demonstrate the existence of He and H white dwarf sequences using photometric data alone. Based on the properties of this catalog, we conclude that upcoming large-scale optical surveys such as the Large Synoptic Survey Telescope will be capable of delivering robust 1% photometry for billions of sources.Comment: 63 pages, 24 figures, submitted to AJ, version with correct figures and catalog available from http://www.astro.washington.edu/ivezic/sdss/catalogs/stripe82.htm

arXiv.org e-Print Archive

CERN Document Server

HoloDetect: Few-Shot Learning for Error Detection

Author: Bengio Yoshua
Elmagarmid Ahmed K.
Globerson Amir
Goodfellow Ian
Guo Chuan
Hinton G. E.
Rahm Erhard
Ratcliff John W.
Zhang Yu
Zhu Xiaojin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2019
Field of study

We introduce a few-shot learning framework for error detection. We show that data augmentation (a form of weak supervision) is key to training high-quality, ML-based error detection models that require minimal human involvement. Our framework consists of two parts: (1) an expressive model to learn rich representations that capture the inherent syntactic and semantic heterogeneity of errors; and (2) a data augmentation model that, given a small seed of clean records, uses dataset-specific transformations to automatically generate additional training data. Our key insight is to learn data augmentation policies from the noisy input dataset in a weakly supervised manner. We show that our framework detects errors with an average precision of ~94% and an average recall of ~93% across a diverse array of datasets that exhibit different types and amounts of errors. We compare our approach to a comprehensive collection of error detection methods, ranging from traditional rule-based methods to ensemble-based and active learning approaches. We show that data augmentation yields an average improvement of 20 F1 points while it requires access to 3x fewer labeled examples compared to other ML approaches.Comment: 18 pages

arXiv.org e-Print Archive

Crossref

Multiple Approaches to Absenteeism Analysis

Author: Sturman Michael C.
Publication venue: DigitalCommons@ILR
Publication date: 01/03/1996
Field of study

Absenteeism research has often been criticized for using inappropriate analysis. Characteristics of absence data, notably that it is usually truncated and skewed, violate assumptions of OLS regression; however, OLS and correlation analysis remain the dominant models of absenteeism research. This piece compares eight models that may be appropriate for analyzing absence data. Specifically, this piece discusses and uses OLS regression, OLS regression with a transformed dependent variable, the Tobit model, Poisson regression, Overdispersed Poisson regression, the Negative Binomial model, Ordinal Logistic regression, and the Ordinal Probit model. A simulation methodology is employed to determine the extent to which each model is likely to produce false positives. Simulations vary with respect to the shape of the dependent variable\u27s distribution, sample size, and the shape of the independent variables\u27 distributions. Actual data,based on a sample of 195 manufacturing employees, is used to illustrate how these models might be used to analyze a real data set. Results from the simulation suggest that, despite methodological expectations, OLS regression does not produce significantly more false positives than expected at various alpha levels. However, the Tobit and Poisson models are often shown to yield too many false positives. A number of other models yield less than the expected number of false positives, thus suggesting that they may serve well as conservative hypothesis tests

DigitalCommons@ILR

eCommons@Cornell