35,134 research outputs found

    Tree models for difference and change detection in a complex environment

    Full text link
    A new family of tree models is proposed, which we call "differential trees." A differential tree model is constructed from multiple data sets and aims to detect distributional differences between them. The new methodology differs from the existing difference and change detection techniques in its nonparametric nature, model construction from multiple data sets, and applicability to high-dimensional data. Through a detailed study of an arson case in New Zealand, where an individual is known to have been laying vegetation fires within a certain time period, we illustrate how these models can help detect changes in the frequencies of event occurrences and uncover unusual clusters of events in a complex environment.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS548 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analytic Expressions for Stochastic Distances Between Relaxed Complex Wishart Distributions

    Full text link
    The scaled complex Wishart distribution is a widely used model for multilook full polarimetric SAR data whose adequacy has been attested in the literature. Classification, segmentation, and image analysis techniques which depend on this model have been devised, and many of them employ some type of dissimilarity measure. In this paper we derive analytic expressions for four stochastic distances between relaxed scaled complex Wishart distributions in their most general form and in important particular cases. Using these distances, inequalities are obtained which lead to new ways of deriving the Bartlett and revised Wishart distances. The expressiveness of the four analytic distances is assessed with respect to the variation of parameters. Such distances are then used for deriving new tests statistics, which are proved to have asymptotic chi-square distribution. Adopting the test size as a comparison criterion, a sensitivity study is performed by means of Monte Carlo experiments suggesting that the Bhattacharyya statistic outperforms all the others. The power of the tests is also assessed. Applications to actual data illustrate the discrimination and homogeneity identification capabilities of these distances.Comment: Accepted for publication in the IEEE Transactions on Geoscience and Remote Sensing journa

    A review of applied methods in Europe for flood-frequency analysis in a changing environment

    Get PDF
    The report presents a review of methods used in Europe for trend analysis, climate change projections and non-stationary analysis of extreme precipitation and flood frequency. In addition, main findings of the analyses are presented, including a comparison of trend analysis results and climate change projections. Existing guidelines in Europe on design flood and design rainfall estimation that incorporate climate change are reviewed. The report concludes with a discussion of research needs on non-stationary frequency analysis for considering the effects of climate change and inclusion in design guidelines. Trend analyses are reported for 21 countries in Europe with results for extreme precipitation, extreme streamflow or both. A large number of national and regional trend studies have been carried out. Most studies are based on statistical methods applied to individual time series of extreme precipitation or extreme streamflow using the non-parametric Mann-Kendall trend test or regression analysis. Some studies have been reported that use field significance or regional consistency tests to analyse trends over larger areas. Some of the studies also include analysis of trend attribution. The studies reviewed indicate that there is some evidence of a general increase in extreme precipitation, whereas there are no clear indications of significant increasing trends at regional or national level of extreme streamflow. For some smaller regions increases in extreme streamflow are reported. Several studies from regions dominated by snowmelt-induced peak flows report decreases in extreme streamflow and earlier spring snowmelt peak flows. Climate change projections have been reported for 14 countries in Europe with results for extreme precipitation, extreme streamflow or both. The review shows various approaches for producing climate projections of extreme precipitation and flood frequency based on alternative climate forcing scenarios, climate projections from available global and regional climate models, methods for statistical downscaling and bias correction, and alternative hydrological models. A large number of the reported studies are based on an ensemble modelling approach that use several climate forcing scenarios and climate model projections in order to address the uncertainty on the projections of extreme precipitation and flood frequency. Some studies also include alternative statistical downscaling and bias correction methods and hydrological modelling approaches. Most studies reviewed indicate an increase in extreme precipitation under a future climate, which is consistent with the observed trend of extreme precipitation. Hydrological projections of peak flows and flood frequency show both positive and negative changes. Large increases in peak flows are reported for some catchments with rainfall-dominated peak flows, whereas a general decrease in flood magnitude and earlier spring floods are reported for catchments with snowmelt-dominated peak flows. The latter is consistent with the observed trends. The review of existing guidelines in Europe on design floods and design rainfalls shows that only few countries explicitly address climate change. These design guidelines are based on climate change adjustment factors to be applied to current design estimates and may depend on design return period and projection horizon. The review indicates a gap between the need for considering climate change impacts in design and actual published guidelines that incorporate climate change in extreme precipitation and flood frequency. Most of the studies reported are based on frequency analysis assuming stationary conditions in a certain time window (typically 30 years) representing current and future climate. There is a need for developing more consistent non-stationary frequency analysis methods that can account for the transient nature of a changing climate

    How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

    Full text link
    Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page
    corecore