12,855 research outputs found
Positive Semidefinite Metric Learning Using Boosting-like Algorithms
The success of many machine learning and pattern recognition methods relies
heavily upon the identification of an appropriate distance metric on the input
data. It is often beneficial to learn such a metric from the input training
data, instead of using a default one such as the Euclidean distance. In this
work, we propose a boosting-based technique, termed BoostMetric, for learning a
quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance
metric requires enforcing the constraint that the matrix parameter to the
metric remains positive definite. Semidefinite programming is often used to
enforce this constraint, but does not scale well and easy to implement.
BoostMetric is instead based on the observation that any positive semidefinite
matrix can be decomposed into a linear combination of trace-one rank-one
matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak
learners within an efficient and scalable boosting-based learning process. The
resulting methods are easy to implement, efficient, and can accommodate various
types of constraints. We extend traditional boosting algorithms in that its
weak learner is a positive semidefinite matrix with trace and rank being one
rather than a classifier or regressor. Experiments on various datasets
demonstrate that the proposed algorithms compare favorably to those
state-of-the-art methods in terms of classification accuracy and running time.Comment: 30 pages, appearing in Journal of Machine Learning Researc
VizRank: Data Visualization Guided by Machine Learning
Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics
- …