65,105 research outputs found
Explicit Model Checking of Very Large MDP using Partitioning and Secondary Storage
The applicability of model checking is hindered by the state space explosion
problem in combination with limited amounts of main memory. To extend its
reach, the large available capacities of secondary storage such as hard disks
can be exploited. Due to the specific performance characteristics of secondary
storage technologies, specialised algorithms are required. In this paper, we
present a technique to use secondary storage for probabilistic model checking
of Markov decision processes. It combines state space exploration based on
partitioning with a block-iterative variant of value iteration over the same
partitions for the analysis of probabilistic reachability and expected-reward
properties. A sparse matrix-like representation is used to store partitions on
secondary storage in a compact format. All file accesses are sequential, and
compression can be used without affecting runtime. The technique has been
implemented within the Modest Toolset. We evaluate its performance on several
benchmark models of up to 3.5 billion states. In the analysis of time-bounded
properties on real-time models, our method neutralises the state space
explosion induced by the time bound in its entirety.Comment: The final publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-24953-7_1
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Dynamic load balancing in parallel KD-tree k-means
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis.
Techniques for improving the efficiency of k-Means have been
largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing
issue. Three solutions have been developed and tested. Two
approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
An optimization framework for solving capacitated multi-level lot-sizing problems with backlogging
This paper proposes two new mixed integer programming models for capacitated multi-level lot-sizing problems with backlogging, whose linear programming relaxations provide good lower bounds on the optimal solution value. We show that both of these strong formulations yield the same lower bounds. In addition to these theoretical results, we propose a new, effective optimization framework that achieves high quality solutions in reasonable computational time. Computational results show that the proposed optimization framework is superior to other well-known approaches on several important performance dimensions
- …