1,286 research outputs found
Implementation and scalability analysis of balancing domain decomposition methods
Manuscript submitted for publication in SIAM Journal of Scientific Computing. Under Review.Preprin
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Implementation and Scalability Analysis of Balancing Domain Decomposition Methods
In this paper we present a detailed description of a high-performance distributed-memory implementation of balancing domain decomposition preconditioning techniques. This coverage provides a pool of implementation hints and considerations that can be very useful for scientists that are willing to tackle large-scale distributed-memory machines using these methods. On the other hand, the paper includes a comprehensive performance and scalability study of the resulting codes when they are applied for the solution of the Poisson problem on a large-scale multicore-based distributed-memory machine with up to 4096 cores. Well-known theoretical results guarantee the optimality (algorithmic scalability) of these preconditioning techniques for weak scaling scenarios, as they are able to keep the condition number of the preconditioned operator bounded by a constant with fixed load per core and increasing number of cores. The experimental study presented in the paper complements this mathematical analysis and answers how far can these methods go in the number of cores and the scale of the problem to still be within reasonable ranges of efficiency on current distributed-memory machines. Besides, for those scenarios where poor scalability is expected, the study precisely identifies, quantifies and justifies which are the main sources of inefficiency
Instance selection of linear complexity for big data
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets.
In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).Supported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness
Biometric Authentication using Nonparametric Methods
The physiological and behavioral trait is employed to develop biometric
authentication systems. The proposed work deals with the authentication of iris
and signature based on minimum variance criteria. The iris patterns are
preprocessed based on area of the connected components. The segmented image
used for authentication consists of the region with large variations in the
gray level values. The image region is split into quadtree components. The
components with minimum variance are determined from the training samples. Hu
moments are applied on the components. The summation of moment values
corresponding to minimum variance components are provided as input vector to
k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU
database consisting of 45 subjects. The number of subjects with zero False
Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance
Rate [FAR] was 45. This paper addresses the computational load reduction in
off-line signature verification based on minimal features using k-means, fuzzy
k-means, k-nn, fuzzy k-nn and novel average-max approaches. FRR of 8.13% and
FAR of 10% was achieved using k-nn classifier. The signature is a biometric,
where variations in a genuine case, is a natural expectation. In the genuine
signature, certain parts of signature vary from one instance to another. The
system aims to provide simple, fast and robust system using less number of
features when compared to state of art works.Comment: 20 page
The Hybrid Dynamic Prototype Construction and Parameter Optimization with Genetic Algorithm for Support Vector Machine
The optimized hybrid artificial intelligence model is a potential tool to deal with construction engineering and management problems. Support vector machine (SVM) has achieved excellent performance in a wide variety of applications. Nevertheless, how to effectively reduce the training complexity for SVM is still a serious challenge. In this paper, a novel order-independent approach for instance selection, called the dynamic condensed nearest neighbor (DCNN) rule, is proposed to adaptively construct prototypes in the training dataset and to reduce the redundant or noisy instances in a classification process for the SVM. Furthermore, a hybrid model based on the genetic algorithm (GA) is proposed to simultaneously optimize the prototype construction and the SVM kernel parameters setting to enhance the classification accuracy. Several UCI benchmark datasets are considered to compare the proposed hybrid GA-DCNN-SVM approach with the previously published GA-based method. The experimental results illustrate that the proposed hybrid model outperforms the existing method and effectively improves the classification performance for the SVM
- …