44,904 research outputs found

    Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

    Get PDF
    The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness

    Sequential Testing with Uniformly Distributed Size

    Get PDF
    Sequential procedures of testing for structural stability do not provide enough guidance on the shape of boundaries that are used to decide on acceptance or rejection, requiring only that the overall size of the test is asymptotically controlled. We introduce and motivate a reasonable criterion for a shape of boundaries which requires that the test size be uniformly distributed over the testing period. Under this criterion, we numerically construct boundaries for most popular sequential tests that are characterized by a test statistic behaving asymptotically either as a Wiener process or Brownian bridge. We handle this problem both in a context of retrospecting a historical sample and in a context of monitoring newly arriving data. We tabulate the boundaries by Â…tting them to certain exible but parsimonious functional forms. Interesting patterns emerge in an illustrative application of sequential tests to the Phillips curve model.Structural stability; sequential tests; CUSUM; retrospection; monitoring; boundaries; asymptotic size

    Cheap versus expensive trades: Assessing the determinants of market impact costs

    Get PDF
    This paper assesses the determinants of market impact costs of institutional equity trades, using unique data from the world's second largest pension fund. We allow the impact of trade characteristics and market conditions on trading costs to depend on the level of trading costs itself and establish significant differences in the responses of cheaper and more expensive trades. We explain the distinct responses from differences in information content and demand for liquidity between trades with high and low trading costs. Finally, to illustrate the practical relevance of the approach, we use our method to forecast future trading costs

    HIV with contact-tracing: a case study in Approximate Bayesian Computation

    Full text link
    Missing data is a recurrent issue in epidemiology where the infection process may be partially observed. Approximate Bayesian Computation, an alternative to data imputation methods such as Markov Chain Monte Carlo integration, is proposed for making inference in epidemiological models. It is a likelihood-free method that relies exclusively on numerical simulations. ABC consists in computing a distance between simulated and observed summary statistics and weighting the simulations according to this distance. We propose an original extension of ABC to path-valued summary statistics, corresponding to the cumulated number of detections as a function of time. For a standard compartmental model with Suceptible, Infectious and Recovered individuals (SIR), we show that the posterior distributions obtained with ABC and MCMC are similar. In a refined SIR model well-suited to the HIV contact-tracing data in Cuba, we perform a comparison between ABC with full and binned detection times. For the Cuban data, we evaluate the efficiency of the detection system and predict the evolution of the HIV-AIDS disease. In particular, the percentage of undetected infectious individuals is found to be of the order of 40%

    Measurement and Modeling of Ground-Level Ozone Concentration in Catania, Italy using Biophysical Remote Sensing and GIS

    Get PDF
    This experimental study examined spatial variation of ground level ozone (O3) in the city of Catania, Italy using thirty passive samplers deployed in a 500-m grid pattern. Significant spatial variation in ground level O3 concentrations (ranging from 12.8 to 41.7 g/m3) was detected across Catania’s urban core and periphery. Biophysical measures derived from satellite imagery and built environment characteristics from GIS were evaluated as correlates of O3 concentrations. A land use regression model based on four variables (land surface temperature, building area, residential street length, and distance to the coast) explained 74% of the variance (adjusted R2) in measured O3. The results of the study suggest that biophysical remote sensing variables are worth further investigation as predictors of ground level O3 (and potentially other air pollutants) because they provide objective measurements that can be tested across multiple locations and over time

    Modeling Compatible Single-Tree Aboveground Biomass Equations of Masson Pine (Pinus massoniana) in South China

    Get PDF
    In the background of facing up to the global climate change, it is becoming the inevitable demand to add forest biomass estimation to national forest resource monitoring. The biomass equations to be developed for forest biomass estimation should be compatible with volume equations. Based on the tree volume and aboveground biomass data of Masson pine (Pinus Massoniana Lamb.) in south China, the one, two and three-variable aboveground biomass equations and biomass conversion functions compatible with tree volume equations were constructed using the error-in-variable simultaneous equations in this paper. The results showed: (i) the prediction precision of aboveground biomass estimates from one variable equation was more than 95%; (ii) the regressions of aboveground biomass equations improved slightly when tree height and crown width were used together with diameter on breast height, although the contributions to regressions were statistically significant; (iii) for biomass conversion function on one variable, the conversion factor was decreased with growing diameter, but for conversion function on two variables, the factor was increased with growing diameter while decreased with growing tree height

    Some statistical aspects of the long-term gill net monitoring programme for pike Esox lucius in Windermere (English Lake District)

    Get PDF
    For more than 55 years, data have been collected on the population of pike Esox lucius in Windermere, first by the Freshwater Biological Association (FBA) and, since 1989, by the Institute of Freshwater Ecology (IFE) of the NERC Centre for Ecology and Hydrology. The aim of this article is to explore some methodological and statistical issues associated with the precision of pike gill net catches and catch-per-unit-effort (CPUE) data, further to those examined by Bagenal (1972) and especially in the light of the current deployment within the Windermere long-term sampling programme. Specifically, consideration is given to the precision of catch estimates from gill netting, including the effects of sampling different locations, the effectiveness of sampling for distinguishing between years, and the effects of changing fishing effort
    • …
    corecore