7,738 research outputs found

    Spatial Econometrics Revisited: A Case Study of Land Values in Roanoke County

    Get PDF
    Omitting spatial characteristics such as proximity to amenities from hedonic land value models may lead to spatial autocorrelation and biased and inefficient estimators. A spatial autoregressive error model can be used to model the spatial structure of errors arising from omitted spatial effects. This paper demonstrates an alternative approach to modeling land values based on individual and joint misspecification tests using data from Roanoke County in Virginia. Spatial autocorrelation is found in land value models of Roanoke County. Defining neighborhoods based on geographic and socioeconomics characteristics produces better estimates of neighborhood effects on land values than simple distance measures. Implementing a comprehensive set of individual and joint misspecification tests results in better correction for misspecification errors compared to existing practices.Land Economics/Use,

    Cluster-Robust Variance Estimation for Dyadic Data

    Get PDF
    Dyadic data are common in the social sciences, although inference for such settings involves accounting for a complex clustering structure. Many analyses in the social sciences fail to account for the fact that multiple dyads share a member, and that errors are thus likely correlated across these dyads. We propose a nonparametric sandwich-type robust variance estimator for linear regression to account for such clustering in dyadic data. We enumerate conditions for estimator consistency. We also extend our results to repeated and weighted observations, including directed dyads and longitudinal data, and provide an implementation for generalized linear models such as logistic regression. We examine empirical performance with simulations and applications to international relations and speed dating

    Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

    Get PDF
    As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. Recently, it has been proposed to tackle the count nature of these data by modeling log-count reads per million as continuous variables, using nonparametric regression to account for their inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a principled, model-free and efficient top-down method for detecting longitudinal changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq identifies those whose expression vary over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the transformed counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, the proposed method is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state of the art methods ROAST, edgeR and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.Comment: 23 pages, 6 figures, typo corrections & acceptance acknowledgemen

    Generic Conditions for Forecast Dominance

    Get PDF
    Recent studies have analyzed whether one forecast method dominates another under a class of consistent scoring functions. While the existing literature focuses on empirical tests of forecast dominance, little is known about the theoretical conditions under which one forecast dominates another. To address this question, we derive a new characterization of dominance among forecasts of the mean functional. We present various scenarios under which dominance occurs. Unlike existing results, our results allow for the case that the forecasts' underlying information sets are not nested, and allow for uncalibrated forecasts that suffer, e.g., from model misspecification or parameter estimation error. We illustrate the empirical relevance of our results via data examples from finance and economics

    Model selection in neural networks

    Get PDF
    In this article we examine how model selection in neural networks can be guided by statistical procedures such as hypotheses tests, information criteria and cross validation. The application of these methods in neural network models is discussed, paying attention especially to the identification problems encountered. We then propose five specification strategies based on different statistical procedures and compare them in a simulation study. As the results of the study are promising, it is suggested that a statistical analysis should become an integral part of neural network modelling. --Neural Networks,Statistical Inference,Model Selection,Identification,Information Criteria,Cross Validation

    Estimation of Conditional Power for Cluster-Randomized Trials with Interval-Censored Endpoints

    Get PDF
    Cluster-randomized trials (CRTs) of infectious disease preventions often yield correlated, interval-censored data: dependencies may exist between observations from the same cluster, and event occurrence may be assessed only at intermittent clinic visits. This data structure must be accounted for when conducting interim monitoring and futility assessment for CRTs. In this article, we propose a flexible framework for conditional power estimation when outcomes are correlated and interval-censored. Under the assumption that the survival times follow a shared frailty model, we first characterize the correspondence between the marginal and cluster-conditional survival functions, and then use this relationship to semiparametrically estimate the cluster-specific survival distributions from the available interim data. We incorporate assumptions about changes to the event process over the remainder of the trial---as well as estimates of the dependency among observations in the same cluster---to extend these survival curves through the end of the study. Based on these projected survival functions we generate correlated interval-censored observations, and then calculate the conditional power as the proportion of times (across multiple full-data generation steps) that the null hypothesis of no treatment effect is rejected. We evaluate the performance of the proposed method through extensive simulation studies, and illustrate its use on a large cluster-randomized HIV prevention trial
    • …
    corecore