5 research outputs found
Mining for Unknown Unknowns
Unknown unknowns are future relevant contingencies that lack an ex ante
description. While there are numerous retrospective accounts showing that
significant gains or losses might have been achieved or avoided had such
contingencies been previously uncovered, getting hold of unknown unknowns still
remains elusive, both in practice and conceptually. Using Formal Concept
Analysis (FCA) - a subfield of lattice theory which is increasingly applied for
mining and organizing data - this paper introduces a simple framework to
systematically think out of the box and direct the search for unknown unknowns.Comment: In Proceedings TARK 2023, arXiv:2307.0400
Size of random Galois lattices and number of frequent itemsets
19 pagesWe compute the mean and the variance of the size of the Galois lattice built from a random matrix with i.i.d. Bernoulli(p) entries. Then, obseving that closed frequent itemsets are in bijection with winning coalitions, we compute the mean and the variance of the number of closed frequent itemsets. This can be of interest for mining association rules
MLE for the parameters of bivariate interval-valued models
With contemporary data sets becoming too large to analyze the data directly,
various forms of aggregated data are becoming common. The original individual
data are points, but after aggregation, the observations are interval-valued
(e.g.). While some researchers simply analyze the set of averages of the
observations by aggregated class, it is easily established that approach
ignores much of the information in the original data set. The initial
theoretical work for interval-valued data was that of Le-Rademacher and Billard
(2011), but those results were limited to estimation of the mean and variance
of a single variable only. This article seeks to redress the limitation of
their work by deriving the maximum likelihood estimator for the all important
covariance statistic, a basic requirement for numerous methodologies, such as
regression, principal components, and canonical analyses. Asymptotic properties
of the proposed estimators are established. The Le-Rademacher and Billard
results emerge as special cases of our wider derivations.Comment: Will appear in ADA