98 research outputs found
Minimizing the error of linear separators on linearly inseparable data
Given linearly inseparable sets R of red points and B of blue points, we consider several
measures of how far they are from being separable. Intuitively, given a potential separator
(‘‘classifier’’), we measure its quality (‘‘error’’) according to how much work it would take
to move the misclassified points across the classifier to yield separated sets. We consider
several measures of work and provide algorithms to find linear classifiers that minimize
the error under these different measures.Ministerio de Educación y Ciencia MTM2008-05866-C03-0
Classification algorithms on the cell processor
The rapid advancement in the capacity and reliability of data storage technology has allowed for the retention of virtually limitless quantity and detail of digital information. Massive information databases are becoming more and more widespread among governmental, educational, scientific, and commercial organizations. By segregating this data into carefully defined input (e.g.: images) and output (e.g.: classification labels) sets, a classification algorithm can be used develop an internal expert model of the data by employing a specialized training algorithm. A properly trained classifier is capable of predicting the output for future input data from the same input domain that it was trained on. Two popular classifiers are Neural Networks and Support Vector Machines. Both, as with most accurate classifiers, require massive computational resources to carry out the training step and can take months to complete when dealing with extremely large data sets. In most cases, utilizing larger training improves the final accuracy of the trained classifier. However, access to the kinds of computational resources required to do so is expensive and out of reach of private or under funded institutions. The Cell Broadband Engine (CBE), introduced by Sony, Toshiba, and IBM has recently been introduced into the market. The current most inexpensive iteration is available in the Sony Playstation 3 ® computer entertainment system. The CBE is a novel multi-core architecture which features many hardware enhancements designed to accelerate the processing of massive amounts of data. These characteristics and the cheap and widespread availability of this technology make the Cell a prime candidate for the task of training classifiers. In this work, the feasibility of the Cell processor in the use of training Neural Networks and Support Vector Machines was explored. In the Neural Network family of classifiers, the fully connected Multilayer Perceptron and Convolution Network were implemented. In the Support Vector Machine family, a Working Set technique known as the Gradient Projection-based Decomposition Technique, as well as the Cascade SVM were implemented
Recommended from our members
Machine Learning Methods for Computational Sustainability
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important and representative issues in the study of sustainability: detecting the presence of species, modeling the distribution of species, and protecting endangered species. For these three issues, this thesis selects three typical problems as the main focus and studies these problems with different machine learning techniques. Specifically, this thesis investigates the problem of detecting bird species from bird song recordings, the problem of modeling migrating birds at the population level, and the problem of designing a conservation area for an endangered species. First, this thesis models the problem of bird song classification as a weakly-supervised learning problem and develops a probabilistic classification model for the learning problem. The thesis also analyzes the learnability of the superset label learning problem to determine conditions under which one can learn a good classifier from the training data. Second, the thesis models bird migration with a probabilistic graphical model at the population level using a Collective Graphical Model (CGM). The thesis proposes a Gaussian approximation to significantly improve the inference efficiency of the model. Theoretical results show that the proposed Gaussian approximation is correct and can be calculated efficiently. Third, the thesis studies a typical reserve design problem with a novel formulation of transductive classification. Then the thesis solves the formulation with two optimization algorithms. The learning techniques in this thesis are general and can also be applied to many other machine learning problems
Separating bichromatic point sets in the plane by restricted orientation convex hulls
The version of record is available online at: http://dx.doi.org/10.1007/s10898-022-01238-9We explore the separability of point sets in the plane by a restricted-orientation convex hull, which is an orientation-dependent, possibly disconnected, and non-convex enclosing shape that generalizes the convex hull. Let R and B be two disjoint sets of red and blue points in the plane, and O be a set of k=2 lines passing through the origin. We study the problem of computing the set of orientations of the lines of O for which the O-convex hull of R contains no points of B. For k=2 orthogonal lines we have the rectilinear convex hull. In optimal O(nlogn) time and O(n) space, n=|R|+|B|, we compute the set of rotation angles such that, after simultaneously rotating the lines of O around the origin in the same direction, the rectilinear convex hull of R contains no points of B. We generalize this result to the case where O is formed by k=2 lines with arbitrary orientations. In the counter-clockwise circular order of the lines of O, let ai be the angle required to clockwise rotate the ith line so it coincides with its successor. We solve the problem in this case in O(1/T·NlogN) time and O(1/T·N) space, where T=min{a1,…,ak} and N=max{k,|R|+|B|}. We finally consider the case in which O is formed by k=2 lines, one of the lines is fixed, and the second line rotates by an angle that goes from 0 to p. We show that this last case can also be solved in optimal O(nlogn) time and O(n) space, where n=|R|+|B|.Carlos Alegría: Research supported by MIUR Proj. “AHeAD” no 20174LF3T8. David Orden:
Research supported by Project PID2019-104129GB-I00 / AEI / 10.13039/501100011033 of the Spanish
Ministry of Science and Innovation. Carlos Seara: Research supported by Project PID2019-104129GB-I00 /
AEI / 10.13039/501100011033 of the Spanish Ministry of Science and Innovation. Jorge Urrutia: Research
supported in part by SEP-CONACYThis project has received funding from the European Union’s Horizon 2020 research and innovation
programme under the Marie Skłodowska–Curie Grant Agreement No 734922.Peer ReviewedPostprint (published version
Efficient piecewise linear classifiers and applications
Supervised learning has become an essential part of data mining for industry, military, science and academia. Classification, a type of supervised learning allows a machine to learn from data to then predict certain behaviours, variables or outcomes. Classification can be used to solve many problems including the detection of malignant cancers, potentially bad creditors and even enabling autonomy in robots. The ability to collect and store large amounts of data has increased significantly over the past few decades. However, the ability of classification techniques to deal with large scale data has not been matched. Many data transformation and reduction schemes have been tried with mixed success. This problem is further exacerbated when dealing with real time classification in embedded systems. The real time classifier must classify using only limited processing, memory and power resources. Piecewise linear boundaries are known to provide efficient real time classifiers. They have low memory requirements, require little processing effort, are parameterless and classify in real time. Piecewise linear functions are used to approximate non-linear decision boundaries between pattern classes. Finding these piecewise linear boundaries is a difficult optimization problem that can require a long training time. Multiple optimization approaches have been used for real time classification, but can lead to suboptimal piecewise linear boundaries. This thesis develops three real time piecewise linear classifiers that deal with large scale data. Each classifier uses a single optimization algorithm in conjunction with an incremental approach that reduces the number of points as the decision boundaries are built. Two of the classifiers further reduce complexity by augmenting the incremental approach with additional schemes. One scheme uses hyperboxes to identify points inside the so-called “indeterminate” regions. The other uses a polyhedral conic set to identify data points lying on or close to the boundary. All other points are excluded from the process of building the decision boundaries. The three classifiers are applied to real time data classification problems and the results of numerical experiments on real world data sets are reported. These results demonstrate that the new classifiers require a reasonable training time and their test set accuracy is consistently good on most data sets compared with current state of the art classifiers.Doctor of Philosoph
- …