569 research outputs found

    Clustering of nonstationary data streams: a survey of fuzzy partitional methods

    Get PDF
    YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca

    Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability

    Full text link
    A flexible method is developed to construct a confidence interval for the frequency of a queried object in a very large data set, based on a much smaller sketch of the data. The approach requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals for random queries using a conformal inference approach. After achieving marginal coverage for random queries under the assumption of data exchangeability, the proposed method is extended to provide stronger inferences accounting for possibly heterogeneous frequencies of different random queries, redundant queries, and distribution shifts. While the presented methods are broadly applicable, this paper focuses on use cases involving the count-min sketch algorithm and a non-linear variation thereof, to facilitate comparison to prior work. In particular, the developed methods are compared empirically to frequentist and Bayesian alternatives, through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.Comment: 56 pages, 31 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:2204.0427

    Game-theoretic statistics and safe anytime-valid inference

    Full text link
    Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.Comment: 25 pages. Under review. ArXiv does not compile/space some references properl

    Game-theoretic statistics and safe anytime-valid inference

    Get PDF
    Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings

    Novel methods for reducing agricultural nutrient loading and eutrophication

    Get PDF
    In many intensively cultivated areas, surface and ground waters suffer from eutrophication and deterioration of the water quality: To improve the environmental protection actions of agriculture, EU countries have adopted common legislation, such as Nitrate Directive and the Water Framework Directive, which set limits to the use of manure and aim at good ecological state of waters by 2015, respectively. Moreover, different voluntary measures and environmental schemes are being supported financially by EU and national goverments to reduce agricultural nutrient loading and eutrophication, for instance by optimizing phosphorus (P) and nitrogen (N) fertilization, controlling erosion and promoting the establishment of buffer zones and wetlands. Yet, good ecological state appears to be unattainable in many agriculturally loaded water bodies in the near future. Former accumulation of nutrients in soils and sediments retards the recovery of waters, implementation of environmentally friendly measures may be inadequate, or the measures themselves are inefficient. There is an obvious need for novel methods and new techniques that speed up the load reduction and the recovery of different types of water bodies and that could be easily adopted by farmers and put into practice by other stakeholders in the river basins. The aim of this workshop, held at MTT Agrifood Research in June 2010, is to discuss novel methods for reducing agricultural nutrient losses and alleviating their effects in water bodies. The novel methods may include: - chemical amendments to reduce soil loss or to immobilize P in soils or in wetlands; - filter systems to remove P from field runoff; - removal of N from runoff waters by fixation to innovative materials; - use of sediment traps; - capturing P in sediments. Targeted and cost-effective use of such methods requires that we recognise the sources and transport routes of nutrients, critical steps in the load generating processes and the magnitude of responses in the rivers, lakes and coastal waters suffering from eutrophication. Moreover, the limitations, possible risks and side-effects must be evaluated. This issue of MTT Science gathers together the abstracts of oral and poster presentations held in the workshop
    • 

    corecore