3,169 research outputs found
Sparse Hopfield network reconstruction with regularization
We propose an efficient strategy to infer sparse Hopfield network based on
magnetizations and pairwise correlations measured through Glauber samplings.
This strategy incorporates the regularization into the Bethe
approximation by a quadratic approximation to the log-likelihood, and is able
to further reduce the inference error of the Bethe approximation without the
regularization. The optimal regularization parameter is observed to be of the
order of where is the number of independent samples. The value
of the scaling exponent depends on the performance measure.
for root mean squared error measure while for
misclassification rate measure. The efficiency of this strategy is demonstrated
for the sparse Hopfield model, but the method is generally applicable to other
diluted mean field models. In particular, it is simple in implementation
without heavy computational cost.Comment: 9 pages, 3 figures, Eur. Phys. J. B (in press
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
Median K-flats for hybrid linear modeling with many outliers
We describe the Median K-Flats (MKF) algorithm, a simple online method for
hybrid linear modeling, i.e., for approximating data by a mixture of flats.
This algorithm simultaneously partitions the data into clusters while finding
their corresponding best approximating l1 d-flats, so that the cumulative l1
error is minimized. The current implementation restricts d-flats to be
d-dimensional linear subspaces. It requires a negligible amount of storage, and
its complexity, when modeling data consisting of N points in D-dimensional
Euclidean space with K d-dimensional linear subspaces, is of order O(n K d D+n
d^2 D), where n is the number of iterations required for convergence
(empirically on the order of 10^4). Since it is an online algorithm, data can
be supplied to it incrementally and it can incrementally produce the
corresponding output. The performance of the algorithm is carefully evaluated
using synthetic and real data
Robust Classification for Imprecise Environments
In real-world environments it usually is difficult to specify target
operating conditions precisely, for example, target misclassification costs.
This uncertainty makes building robust classification systems problematic. We
show that it is possible to build a hybrid classifier that will perform at
least as well as the best available classifier for any target conditions. In
some cases, the performance of the hybrid actually can surpass that of the best
known classifier. This robust performance extends across a wide variety of
comparison frameworks, including the optimization of metrics such as accuracy,
expected cost, lift, precision, recall, and workforce utilization. The hybrid
also is efficient to build, to store, and to update. The hybrid is based on a
method for the comparison of classifier performance that is robust to imprecise
class distributions and misclassification costs. The ROC convex hull (ROCCH)
method combines techniques from ROC analysis, decision analysis and
computational geometry, and adapts them to the particulars of analyzing learned
classifiers. The method is efficient and incremental, minimizes the management
of classifier performance data, and allows for clear visual comparisons and
sensitivity analyses. Finally, we point to empirical evidence that a robust
hybrid classifier indeed is needed for many real-world problems.Comment: 24 pages, 12 figures. To be published in Machine Learning Journal.
For related papers, see http://www.hpl.hp.com/personal/Tom_Fawcett/ROCCH
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks
The graph Laplacian is a standard tool in data science, machine learning, and
image processing. The corresponding matrix inherits the complex structure of
the underlying network and is in certain applications densely populated. This
makes computations, in particular matrix-vector products, with the graph
Laplacian a hard task. A typical application is the computation of a number of
its eigenvalues and eigenvectors. Standard methods become infeasible as the
number of nodes in the graph is too large. We propose the use of the fast
summation based on the nonequispaced fast Fourier transform (NFFT) to perform
the dense matrix-vector product with the graph Laplacian fast without ever
forming the whole matrix. The enormous flexibility of the NFFT algorithm allows
us to embed the accelerated multiplication into Lanczos-based eigenvalues
routines or iterative linear system solvers and even consider other than the
standard Gaussian kernels. We illustrate the feasibility of our approach on a
number of test problems from image segmentation to semi-supervised learning
based on graph-based PDEs. In particular, we compare our approach with the
Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of
the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure
A Simple Iterative Algorithm for Parsimonious Binary Kernel Fisher Discrimination
By applying recent results in optimization theory variously known as optimization transfer or majorize/minimize algorithms, an algorithm for binary, kernel, Fisher discriminant analysis is introduced that makes use of a non-smooth penalty on the coefficients to provide a parsimonious solution. The problem is converted into a smooth optimization that can be solved iteratively with no greater overhead than iteratively re-weighted least-squares. The result is simple, easily programmed and is shown to perform, in terms of both accuracy and parsimony, as well as or better than a number of leading machine learning algorithms on two well-studied and substantial benchmarks
Randomized hybrid linear modeling by local best-fit flats
The hybrid linear modeling problem is to identify a set of d-dimensional
affine sets in a D-dimensional Euclidean space. It arises, for example, in
object tracking and structure from motion. The hybrid linear model can be
considered as the second simplest (behind linear) manifold model of data. In
this paper we will present a very simple geometric method for hybrid linear
modeling based on selecting a set of local best fit flats that minimize a
global l1 error measure. The size of the local neighborhoods is determined
automatically by the Jones' l2 beta numbers; it is proven under certain
geometric conditions that good local neighborhoods exist and are found by our
method. We also demonstrate how to use this algorithm for fast determination of
the number of affine subspaces. We give extensive experimental evidence
demonstrating the state of the art accuracy and speed of the algorithm on
synthetic and real hybrid linear data.Comment: To appear in the proceedings of CVPR 201
- …