591 research outputs found
Clustering of nonstationary data streams: a survey of fuzzy partitional methods
YesData streams have arisen as a relevant research topic during the past decade. They are realâtime, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dellâIstruzione, dellâUniversitĂĄ e della Ricerca
Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability
A flexible method is developed to construct a confidence interval for the
frequency of a queried object in a very large data set, based on a much smaller
sketch of the data. The approach requires no knowledge of the data distribution
or of the details of the sketching algorithm; instead, it constructs provably
valid frequentist confidence intervals for random queries using a conformal
inference approach. After achieving marginal coverage for random queries under
the assumption of data exchangeability, the proposed method is extended to
provide stronger inferences accounting for possibly heterogeneous frequencies
of different random queries, redundant queries, and distribution shifts. While
the presented methods are broadly applicable, this paper focuses on use cases
involving the count-min sketch algorithm and a non-linear variation thereof, to
facilitate comparison to prior work. In particular, the developed methods are
compared empirically to frequentist and Bayesian alternatives, through
simulations and experiments with data sets of SARS-CoV-2 DNA sequences and
classic English literature.Comment: 56 pages, 31 figures, 2 tables. arXiv admin note: substantial text
overlap with arXiv:2204.0427
Game-theoretic statistics and safe anytime-valid inference
Safe anytime-valid inference (SAVI) provides measures of statistical evidence
and certainty -- e-processes for testing and confidence sequences for
estimation -- that remain valid at all stopping times, accommodating continuous
monitoring and analysis of accumulating data and optional stopping or
continuation for any reason. These measures crucially rely on test martingales,
which are nonnegative martingales starting at one. Since a test martingale is
the wealth process of a player in a betting game, SAVI centrally employs
game-theoretic intuition, language and mathematics. We summarize the SAVI goals
and philosophy, and report recent advances in testing composite hypotheses and
estimating functionals in nonparametric settings.Comment: 25 pages. Under review. ArXiv does not compile/space some references
properl
Game-theoretic statistics and safe anytime-valid inference
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certaintyâe-processes for testing and confidence sequences for estimationâthat remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings
Novel methods for reducing agricultural nutrient loading and eutrophication
In many intensively cultivated areas, surface and ground waters suffer from eutrophication and deterioration of the water quality: To improve the environmental protection actions of agriculture, EU countries have adopted common legislation, such as Nitrate Directive and the Water Framework Directive, which set limits to the use of manure and aim at good ecological state of waters by 2015, respectively. Moreover, different voluntary measures and environmental schemes are being supported financially by EU and national goverments to reduce agricultural nutrient loading and eutrophication, for instance by optimizing phosphorus (P) and nitrogen (N) fertilization, controlling erosion and promoting the establishment of buffer zones and wetlands.
Yet, good ecological state appears to be unattainable in many agriculturally loaded water bodies in the near future. Former accumulation of nutrients in soils and sediments retards the recovery of waters, implementation of environmentally friendly measures may be inadequate, or the measures themselves are inefficient. There is an obvious need for novel methods and new techniques that speed up the load reduction and the recovery of different types of water bodies and that could be easily adopted by farmers and put into practice by other stakeholders in the river basins.
The aim of this workshop, held at MTT Agrifood Research in June 2010, is to discuss novel methods for reducing agricultural nutrient losses and alleviating their effects in water bodies. The novel methods may include:
- chemical amendments to reduce soil loss or to immobilize P in soils or in wetlands;
- filter systems to remove P from field runoff;
- removal of N from runoff waters by fixation to innovative materials;
- use of sediment traps;
- capturing P in sediments.
Targeted and cost-effective use of such methods requires that we recognise the sources and transport routes of nutrients, critical steps in the load generating processes and the magnitude of responses in the rivers, lakes and coastal waters suffering from eutrophication. Moreover, the limitations, possible risks and side-effects must be evaluated.
This issue of MTT Science gathers together the abstracts of oral and poster presentations held in the workshop
- âŠ