8,295 research outputs found
The Potential of Restarts for ProbSAT
This work analyses the potential of restarts for probSAT, a quite successful
algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT
instances that are close to the phase transition. We estimate an optimal
restart time from empirical data, reaching a potential speedup factor of 1.39.
Calculating restart times from fitted probability distributions reduces this
factor to a maximum of 1.30. A spin-off result is that the Weibull distribution
approximates the runtime distribution for over 93% of the used instances well.
A machine learning pipeline is presented to compute a restart time for a
fixed-cutoff strategy to exploit this potential. The main components of the
pipeline are a random forest for determining the distribution type and a neural
network for the distribution's parameters. ProbSAT performs statistically
significantly better than Luby's restart strategy and the policy without
restarts when using the presented approach. The structure is particularly
advantageous on hard problems.Comment: Eurocast 201
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
Telling Cause from Effect using MDL-based Local and Global Regression
We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables and from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer causes in case it is shorter to describe as a function of
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.Comment: 10 pages, To appear in ICDM1
- …