65 research outputs found
Fast optimization of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a density based model
which searches for a linear projection maximizing the Cauchy-Schwarz Divergence
of dataset kernel density estimation. Despite its good empirical results, one
of its drawbacks is the optimization speed. In this paper we analyze how one
can speed it up through solving an approximate problem. We analyze two methods,
both similar to the approximate solutions of the Kernel Density Estimation
querying and provide adaptive schemes for selecting a crucial parameters based
on user-specified acceptable error. Furthermore we show how one can exploit
well known conjugate gradients and L-BFGS optimizers despite the fact that the
original optimization problem should be solved on the sphere. All above methods
and modifications are tested on 10 real life datasets from UCI repository to
confirm their practical usability.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term in Puslitbang tekMIRA
Indonesian government agencies under the Ministry of Energy and Mineral Resources have problems in classifying data dictionary of coal. This research conduct grouping coal dictionary using K-Means and MeanShift algorithm. K-means algorithm is used to get cluster value on character and word criteria. The last iteration of Euclidian distance calculation data on k-means combine with Meanshift algorithm. The meanshift calculates centroid by selecting different bandwidths. The result of grouping using k-means and meanshift algorithm shows different centroid to find optimum bandwidth value. The data dictionary of this research has sorted in alphabetically
Estimating a Signal In the Presence of an Unknown Background
We describe a method for fitting distributions to data which only requires
knowledge of the parametric form of either the signal or the background but not
both. The unknown distribution is fit using a non-parametric kernel density
estimator. The method returns parameter estimates as well as errors on those
estimates. Simulation studies show that these estimates are unbiased and that
the errors are correct
The Bregman Variational Dual-Tree Framework
Graph-based methods provide a powerful tool set for many non-parametric
frameworks in Machine Learning. In general, the memory and computational
complexity of these methods is quadratic in the number of examples in the data
which makes them quickly infeasible for moderate to large scale datasets. A
significant effort to find more efficient solutions to the problem has been
made in the literature. One of the state-of-the-art methods that has been
recently introduced is the Variational Dual-Tree (VDT) framework. Despite some
of its unique features, VDT is currently restricted only to Euclidean spaces
where the Euclidean distance quantifies the similarity. In this paper, we
extend the VDT framework beyond the Euclidean distance to more general Bregman
divergences that include the Euclidean distance as a special case. By
exploiting the properties of the general Bregman divergence, we show how the
new framework can maintain all the pivotal features of the VDT framework and
yet significantly improve its performance in non-Euclidean domains. We apply
the proposed framework to different text categorization problems and
demonstrate its benefits over the original VDT.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
- …