Search CORE

96 research outputs found

Measuring the stability of histogram appearance when the anchor position is changed

Author: Frederic Udina
Jeffrey S. Simonoff
Publication venue
Publication date
Field of study

Although the histogram is the most widely used density estimator, it is well--known that the appearance of a constructed histogram for a given bin width can change markedly for different choices of anchor position. In this paper we construct a stability index

G

that assesses the potential changes in the appearance of histograms for a given data set and bin width as the anchor position changes. If a particular bin width choice leads to an unstable appearance, the arbitrary choice of any one anchor position is dangerous, and a different bin width should be considered. The index is based on the statistical roughness of the histogram estimate. We show via Monte Carlo simulation that densities with more structure are more likely to lead to histograms with unstable appearance. In addition, ignoring the precision to which the data values are provided when choosing the bin width leads to instability. We provide several real data examples to illustrate the properties of

G

. Applications to other binned density estimators are also discussed.Bin width, frequency polygon, Gini index, linear binning, Lorenz curve, Monte Carlo simulation

Research Papers in Economics

Three Sides of Smoothing: Categorical Data Smoothing, Nonparametric Regression, and Density Estimation

Author: Simonoff Jeffrey S.
Publication venue: Stern School of Business, New York University
Publication date: 01/01/1997
Field of study

The past forty years have seen a great deal of research into the construction and properties of nonparametric estimates of smooth functions. This research has focused primarily on two sides of the smoothing problem: nonparametric regression and density estimation. Theoretical results for these two situations are similar, and multivariate density estimation was an early justification for the Nadaraya-Watson kernel regression estimator. A third, less well-explored, strand of applications of smoothing is to the estimation of probabilities in categorical data. In this paper the position of categorical data smoothing as a bridge between nonparametric regression and density estimation is explored. Nonparametric regression provides a paradigm for the construction of effective categorical smoothing estimates, and use of an appropriate likelihood function yields cell probability estimates with many desirable properties. Such estimates can be used to construct regression estimates when one or more of the categorical variables are viewed as response variables. They also lead naturally to the construction of well-behaved density estimates using local or penalized likelihood estimation, which can then be used in a regression context. Several real data sets are used to illustrate these points.Statistics Working Papers Serie

New York University Faculty Digital Archive

An Empirical Study of Factors Relating to the Success of Broadway Shows

Author: Ma Lan
Simonoff Jeffrey S.
Publication venue: Stern School of Business, New York University
Publication date: 01/01/2000
Field of study

This article uses the Cox proportional hazards model to analyze recent Broadway show data to investigate the factors that relate to the longevity of shows. The type of show, whether a show is a revival, and first-week attendance for the show are predictive for longevity. Favorable critic reviews in the New York Daily News are related to greater success, but reviews in the New York Times are not. Winning major Tony Awards is associated with a longer run for a show, but being nominated for Tonys and then losing is associated with a shorter postaward run.Statistics Working Papers Serie

New York University Faculty Digital Archive

An Investigation of Missing Data Methods for Classiffcation Trees

Author: Ding Yufeng
Simonoff Jeffrey S.
Publication venue: Stern School of Business, New York University
Publication date: 03/12/2006
Field of study

There are many different missing data methods used by classification tree algorithms, but few studies have been done comparing their appropriateness and performance. This paper provides both analytic and Monte Carlo evidence regarding the effectiveness of six popular missing data methods for classification trees. We show that in the context of classification trees, the relationship between the missingness and the dependent variable, rather than the standard missingness classification approach of Little and Rubin (2002) (missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR)), is the most helpful criterion to distinguish different missing data methods. We make recommendations as to the best method to use in various situations. The paper concludes with discussion of a real data set related to predicting bankruptcy of a firm.Statistics Working Papers Serie

New York University Faculty Digital Archive

An Investigation of Missing Data Methods for Classification Trees

Author: Ding Yufeng
Simonoff Jeffrey S.
Publication venue
Publication date: 13/10/2008
Field of study

New York University Faculty Digital Archive