5,291 research outputs found
Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs
Shapelets are discriminative time series subsequences that allow generation
of interpretable classification models, which provide faster and generally
better classification than the nearest neighbor approach. However, the shapelet
discovery process requires the evaluation of all possible subsequences of all
time series in the training set, making it extremely computation intensive.
Consequently, shapelet discovery for large time series datasets quickly becomes
intractable. A number of improvements have been proposed to reduce the training
time. These techniques use approximation or discretization and often lead to
reduced classification accuracy compared to the exact method.
We are proposing the use of ensembles of shapelet-based classifiers obtained
using random sampling of the shapelet candidates. Using random sampling reduces
the number of evaluated candidates and consequently the required computational
cost, while the classification accuracy of the resulting models is also not
significantly different than that of the exact algorithm. The combination of
randomized classifiers rectifies the inaccuracies of individual models because
of the diversity of the solutions. Based on the experiments performed, it is
shown that the proposed approach of using an ensemble of inexpensive
classifiers provides better classification accuracy compared to the exact
method at a significantly lesser computational cost
Integrating Learning from Examples into the Search for Diagnostic Policies
This paper studies the problem of learning diagnostic policies from training
examples. A diagnostic policy is a complete description of the decision-making
actions of a diagnostician (i.e., tests followed by a diagnostic decision) for
all possible combinations of test results. An optimal diagnostic policy is one
that minimizes the expected total cost, which is the sum of measurement costs
and misdiagnosis costs. In most diagnostic settings, there is a tradeoff
between these two kinds of costs. This paper formalizes diagnostic decision
making as a Markov Decision Process (MDP). The paper introduces a new family of
systematic search algorithms based on the AO* algorithm to solve this MDP. To
make AO* efficient, the paper describes an admissible heuristic that enables
AO* to prune large parts of the search space. The paper also introduces several
greedy algorithms including some improvements over previously-published
methods. The paper then addresses the question of learning diagnostic policies
from examples. When the probabilities of diseases and test results are computed
from training data, there is a great danger of overfitting. To reduce
overfitting, regularizers are integrated into the search algorithms. Finally,
the paper compares the proposed methods on five benchmark diagnostic data sets.
The studies show that in most cases the systematic search methods produce
better diagnostic policies than the greedy methods. In addition, the studies
show that for training sets of realistic size, the systematic search algorithms
are practical on todays desktop computers
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Mining data streams using option trees (revised edition, 2004)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time.
This paper describes a data stream classi_cation algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset
A review of associative classification mining
Associative classification mining is a promising approach in data mining that utilizes the
association rule discovery techniques to construct classification systems, also known as
associative classifiers. In the last few years, a number of associative classification algorithms
have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms
employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule
evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative
classification techniques with regards to the above criteria. Finally, future directions in associative
classification, such as incremental learning and mining low-quality data sets, are also
highlighted in this paper
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
- …