573 research outputs found
Multinomial goodness-of-fit: large sample tests with survey design correction and exact tests for small samples
A new Stata command called -mgof- is introduced. The command is used to compute distributional tests for discrete (categorical, multinomial) variables. Apart from classic large sample -approximation tests based on Pearson's , the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read 1984), large sample tests for complex survey designs and exact tests for small samples are supported. The complex survey correction is based on the approach by Rao and Scott (1981) and parallels the survey design correction used for independence tests in -svy:tabulate-. The exact tests are computed using Monte Carlo methods or exhaustive enumeration. An exact Kolmogorov-Smirnov test for discrete data is also provided.multinomial, goodness-of-fit, chi-squared, categorical data, exact tests, Monte Carlo, exhaustive enumeration, combinatorial algorithms, complex survey correction, power-divergence statistic, Kolmogorov-Smirnov, Benford's law
Benford's Law and Fraud Detection. Facts and Legends
Is Benford's law a good instrument to detect fraud in reports of statistical and scientific data? For a valid test the probability of "false positives" and "false negatives" has to be low. However, it is very doubtful whether the Benford distribution is an appropriate tool to discriminate between manipulated and non-manipulated estimates. Further research should focus more on the validity of the test and test results should be interpreted more carefully.Benford's law, fraud detection, false positive, false negative, regression coefficients
Sensitive Questions in Online Surveys: Experimental Results for the Randomized Response Technique (RRT) and the Unmatched Count Technique (UCT)
Gaining valid answers to so-called sensitive questions is an age-old problem in survey research. Various techniques have been developed to guarantee anonymity and minimize the respondent's feelings of jeopardy. Two such techniques are the randomized response technique (RRT) and the unmatched count technique (UCT). In this study we evaluate the effectiveness of different implementations of the RRT (using a forced-response design) in a computer-assisted setting and also compare the use of the RRT to that of the UCT. The techniques are evaluated according to various quality criteria, such as the prevalence estimates they provide, the ease of their use, and respondent trust in the techniques. Our results indicate that the RRTs are problematic with respect to several domains, such as the limited trust they inspire and non-response, and that the RRT estimates are unreliable due to a strong false "no" bias, especially for the more sensitive questions. The UCT, however, performed well compared to the RRTs on all the evaluated measures. The UCT estimates also had more face validity than the RRT estimates. We conclude that the UCT is a promising alternative to RRT in self-administered surveys and that future research should be directed towards evaluating and improving the technique.sensitive questions, online survey, randomized response technique, unmatched count technique, item count technique, methodological experiment
Entropy balancing as an estimation command
Entropy balancing is a popular reweighting technique that provides an alternative to approaches such as, for example, inverse probability weighting (IPW) based on a logit or probit model. Even if the balancing weights resulting from the procedure will be of primary interest in most applications, it is noteworthy that entropy balancing can be represented as a simple regression-like model. An advantage of treating entropy balancing as a parametric model is that it clarifies how the reweighting affects statistical inference. In this article I present a new Stata command called -ebalfit- that estimates such a model including the variance-covariance matrix of the estimated coefficients. The balancing weights are then obtained as model predictions. Variance estimation is based on influence functions, which can be stored for further use, for example, to obtain consistent standard errors for statistics computed from the reweighted data
ColrSpace: A Mata class for color management
ColrSpace is a class-based color management system implemented in Mata. It
supports a wide variety of color spaces and translations among them, provides
color generators and a large collection of named palettes, and features
functionality such as color interpolation, grayscale conversion, or color
vision deficiency simulation. ColrSpace requires Stata 14.2 or newer
Color palettes for Stata graphics: an update
This paper is an update to Jann (2018). It contains a comprehensive discussion of the -colorpalette- command, including various changes and additions that have been made to the software since its first publication. Command -colorpalette- provides colors for use in Stata graphics. In addition to Stata's default colors, -colorpalette- supports a variety of named colors, a selection of palettes that have been proposed by users, numerous collections of palettes and colormaps from sources such as ColorBrewer, Carto, D3.js, or Matplotlib, as well as color generators in different color spaces. The command also provides features such as color interpolation or color vision deficiency simulation
- …