8,295 research outputs found

    The Potential of Restarts for ProbSAT

    Full text link
    This work analyses the potential of restarts for probSAT, a quite successful algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT instances that are close to the phase transition. We estimate an optimal restart time from empirical data, reaching a potential speedup factor of 1.39. Calculating restart times from fitted probability distributions reduces this factor to a maximum of 1.30. A spin-off result is that the Weibull distribution approximates the runtime distribution for over 93% of the used instances well. A machine learning pipeline is presented to compute a restart time for a fixed-cutoff strategy to exploit this potential. The main components of the pipeline are a random forest for determining the distribution type and a neural network for the distribution's parameters. ProbSAT performs statistically significantly better than Luby's restart strategy and the policy without restarts when using the presented approach. The structure is particularly advantageous on hard problems.Comment: Eurocast 201

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

    Telling Cause from Effect using MDL-based Local and Global Regression

    Get PDF
    We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables XX and YY from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer XX causes YY in case it is shorter to describe YY as a function of XX than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.Comment: 10 pages, To appear in ICDM1
    • …
    corecore