8 research outputs found
Sparse Partitioning Around Medoids
Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering
technique to use with arbitrary distance functions or similarities, where each
cluster is represented by its most central object, called the medoid or the
discrete median. In operations research, this family of problems is also known
as facility location problem (FLP). FastPAM recently introduced a speedup for
large k to make it applicable for larger problems, but the method still has a
runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric
variant of this problem, to be used for example on graph data such as road
networks. By exploiting sparsity, we can avoid the quadratic runtime and memory
requirements, and make this method scalable to even larger problems, as long as
we are able to build a small enough graph of sufficient connectivity to perform
local optimization. Furthermore, we consider asymmetric cases, where the set of
medoids is not identical to the set of points to be covered (or in the
interpretation of facility location, where the possible facility locations are
not identical to the consumer locations). Because of sparsity, it may be
impossible to cover all points with just k medoids for too small k, which would
render the problem unsolvable, and this breaks common heuristics for finding a
good starting condition. We, hence, consider determining k as a part of the
optimization problem and propose to first construct a greedy initial solution
with a larger k, then to optimize the problem by alternating between PAM-style
"swap" operations where the result is improved by replacing medoids with better
alternatives and "remove" operations to reduce the number of k until neither
allows further improving the result quality. We demonstrate the usefulness of
this method on a problem from electrical engineering, with the input graph
derived from cartographic data
Discrepancy-Based Active Learning for Domain Adaptation
The goal of the paper is to design active learning strategies which lead to
domain adaptation under an assumption of covariate shift in the case of
Lipschitz labeling function. Building on previous work by Mansour et al. (2009)
we adapt the concept of discrepancy distance between source and target
distributions to restrict the maximization over the hypothesis class to a
localized class of functions which are performing accurate labeling on the
source domain. We derive generalization error bounds for such active learning
strategies in terms of Rademacher average and localized discrepancy for general
loss functions which satisfy a regularity condition. A practical K-medoids
algorithm that can address the case of large data set is inferred from the
theoretical bounds. Our numerical experiments show that the proposed algorithm
is competitive against other state-of-the-art active learning techniques in the
context of domain adaptation, in particular on large data sets of around one
hundred thousand images.Comment: 28 pages, 11 figure
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters