7,824 research outputs found
Recommended from our members
Incremental learning of independent, overlapping, and graded concept descriptions with an instance-based process framework
Supervised learning algorithms make several simplifying assumptions concerning the characteristics of the concept descriptions to be learned. For example, concepts are often assumed to be (1) defined with respect to the same set of relevant attributes, (2) disjoint in instance space, and (3) have uniform instance distributions. While these assumptions constrain the learning task, they unfortunately limit an algorithm's applicability. We believe that supervised learning algorithms should learn attribute relevancies independently for each concept, allow instances to be members of any subset of concepts, and represent graded concept descriptions. This paper introduces a process framework for instance-based learning algorithms that exploit only specific instance and performance feedback information to guide their concept learning processes. We also introduce Bloom, a specific instantiation of this framework. Bloom is a supervised, incremental, instance-based learning algorithm that learns relative attribute relevancies independently for each concept, allows instances to be members of any subset of concepts, and represents graded concept memberships. We describe empirical evidence to support our claims that Bloom can learn independent, overlapping, and graded concept descriptions
Scalable approximate FRNN-OWA classification
Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster
Providing Diversity in K-Nearest Neighbor Query Results
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)
queries return the K closest answers according to given distance metric in the
database with respect to Q. In this scenario, it is possible that a majority of
the answers may be very similar to some other, especially when the data has
clusters. For a variety of applications, such homogeneous result sets may not
add value to the user. In this paper, we consider the problem of providing
diversity in the results of KNN queries, that is, to produce the closest result
set such that each answer is sufficiently different from the rest. We first
propose a user-tunable definition of diversity, and then present an algorithm,
called MOTLEY, for producing a diverse result set as per this definition.
Through a detailed experimental evaluation on real and synthetic data, we show
that MOTLEY can produce diverse result sets by reading only a small fraction of
the tuples in the database. Further, it imposes no additional overhead on the
evaluation of traditional KNN queries, thereby providing a seamless interface
between diversity and distance.Comment: 20 pages, 11 figure
An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage
The results from most machine learning experiments are used for a specific
purpose and then discarded. This results in a significant loss of information
and requires rerunning experiments to compare learning algorithms. This also
requires implementation of another algorithm for comparison, that may not
always be correctly implemented. By storing the results from previous
experiments, machine learning algorithms can be compared easily and the
knowledge gained from them can be used to improve their performance. The
purpose of this work is to provide easy access to previous experimental results
for learning and comparison. These stored results are comprehensive -- storing
the prediction for each test instance as well as the learning algorithm,
hyperparameters, and training set that were used. Previous results are
particularly important for meta-learning, which, in a broad sense, is the
process of learning from previous machine learning results such that the
learning process is improved. While other experiment databases do exist, one of
our focuses is on easy access to the data. We provide meta-learning data sets
that are ready to be downloaded for meta-learning experiments. In addition,
queries to the underlying database can be made if specific information is
desired. We also differ from previous experiment databases in that our
databases is designed at the instance level, where an instance is an example in
a data set. We store the predictions of a learning algorithm trained on a
specific training set for each instance in the test set. Data set level
information can then be obtained by aggregating the results from the instances.
The instance level information can be used for many tasks such as determining
the diversity of a classifier or algorithmically determining the optimal subset
of training instances for a learning algorithm.Comment: 7 pages, 1 figure, 6 table
- …