76 research outputs found
Clustering on the Edge: Learning Structure in Graphs
With the recent popularity of graphical clustering methods, there has been an
increased focus on the information between samples. We show how learning
cluster structure using edge features naturally and simultaneously determines
the most likely number of clusters and addresses data scale issues. These
results are particularly useful in instances where (a) there are a large number
of clusters and (b) we have some labeled edges. Applications in this domain
include image segmentation, community discovery and entity resolution. Our
model is an extension of the planted partition model and our solution uses
results of correlation clustering, which achieves a partition O(log(n))-close
to the log-likelihood of the true clustering
System-Level Predictive Maintenance: Review of Research Literature and Gap Analysis
This paper reviews current literature in the field of predictive maintenance
from the system point of view. We differentiate the existing capabilities of
condition estimation and failure risk forecasting as currently applied to
simple components, from the capabilities needed to solve the same tasks for
complex assets. System-level analysis faces more complex latent degradation
states, it has to comprehensively account for active maintenance programs at
each component level and consider coupling between different maintenance
actions, while reflecting increased monetary and safety costs for system
failures. As a result, methods that are effective for forecasting risk and
informing maintenance decisions regarding individual components do not readily
scale to provide reliable sub-system or system level insights. A novel holistic
modeling approach is needed to incorporate available structural and physical
knowledge and naturally handle the complexities of actively fielded and
maintained assets.Comment: 24 pages, 3 figure
Performance Bounds for Pairwise Entity Resolution
One significant challenge to scaling entity resolution algorithms to massive
datasets is understanding how performance changes after moving beyond the realm
of small, manually labeled reference datasets. Unlike traditional machine
learning tasks, when an entity resolution algorithm performs well on small
hold-out datasets, there is no guarantee this performance holds on larger
hold-out datasets. We prove simple bounding properties between the performance
of a match function on a small validation set and the performance of a pairwise
entity resolution algorithm on arbitrarily sized datasets. Thus, our approach
enables optimization of pairwise entity resolution algorithms for large
datasets, using a small set of labeled data
Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks
We describe a new approach to estimating relative risks in time-to-event
prediction problems with censored data in a fully parametric manner. Our
approach does not require making strong assumptions of constant proportional
hazard of the underlying survival distribution, as required by the
Cox-proportional hazard model. By jointly learning deep nonlinear
representations of the input covariates, we demonstrate the benefits of our
approach when used to estimate survival risks through extensive experimentation
on multiple real world datasets with different levels of censoring. We further
demonstrate advantages of our model in the competing risks scenario. To the
best of our knowledge, this is the first work involving fully parametric
estimation of survival times with competing risks in the presence of censoring.Comment: Also appeared in NeurIPS 2019 Workshop on Machine Learning for
Healthcare (ML4H
Double Adaptive Stochastic Gradient Optimization
Adaptive moment methods have been remarkably successful in deep learning
optimization, particularly in the presence of noisy and/or sparse gradients. We
further the advantages of adaptive moment techniques by proposing a family of
double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the
complementary ideas of the adaptive moment algorithms widely used by deep
learning community, and recent advances in adaptive probabilistic algorithms.We
analyze the theoretical convergence improvements of our approach in a
stochastic convex optimization setting, and provide empirical validation of our
findings with convex and non convex objectives. We observe that the benefits
of~\textsc{DASGrad} increase with the model complexity and variability of the
gradients, and we explore the resulting utility in extensions of
distribution-matching multitask learning
On the Interaction Effects Between Prediction and Clustering
Machine learning systems increasingly depend on pipelines of multiple
algorithms to provide high quality and well structured predictions. This paper
argues interaction effects between clustering and prediction (e.g.
classification, regression) algorithms can cause subtle adverse behaviors
during cross-validation that may not be initially apparent. In particular, we
focus on the problem of estimating the out-of-cluster (OOC) prediction loss
given an approximate clustering with probabilistic error rate .
Traditional cross-validation techniques exhibit significant empirical bias in
this setting, and the few attempts to estimate and correct for these effects
are intractable on larger datasets. Further, no previous work has been able to
characterize the conditions under which these empirical effects occur, and if
they do, what properties they have. We precisely answer these questions by
providing theoretical properties which hold in various settings, and prove that
expected out-of-cluster loss behavior rapidly decays with even minor clustering
errors. Fortunately, we are able to leverage these same properties to construct
hypothesis tests and scalable estimators necessary for correcting the problem.
Empirical results on benchmark datasets validate our theoretical results and
demonstrate how scaling techniques provide solutions to new classes of
problems
Pairwise Feedback for Data Programming
The scalability of the labeling process and the attainable quality of labels
have become limiting factors for many applications of machine learning. The
programmatic creation of labeled datasets via the synthesis of noisy heuristics
provides a promising avenue to address this problem. We propose to improve
modeling of latent class variables in the programmatic creation of labeled
datasets by incorporating pairwise feedback into the process. We discuss the
ease with which such pairwise feedback can be obtained or generated in many
application domains. Our experiments show that even a small number of sources
of pairwise feedback can substantially improve the quality of the posterior
estimate of the latent class variable.Comment: Presented at the NeurIPS 2019 workshop on Learning with Rich
Experience: Integration of Learning Paradigm
Novel Prediction Techniques Based on Clusterwise Linear Regression
In this paper we explore different regression models based on Clusterwise
Linear Regression (CLR). CLR aims to find the partition of the data into
clusters, such that linear regressions fitted to each of the clusters minimize
overall mean squared error on the whole data. The main obstacle preventing to
use found regression models for prediction on the unseen test points is the
absence of a reasonable way to obtain CLR cluster labels when the values of
target variable are unknown. In this paper we propose two novel approaches on
how to solve this problem. The first approach, predictive CLR builds a separate
classification model to predict test CLR labels. The second approach,
constrained CLR utilizes a set of user-specified constraints that enforce
certain points to go to the same clusters. Assuming the constraint values are
known for the test points, they can be directly used to assign CLR labels. We
evaluate these two approaches on three UCI ML datasets as well as on a large
corpus of health insurance claims. We show that both of the proposed algorithms
significantly improve over the known CLR-based regression methods. Moreover,
predictive CLR consistently outperforms linear regression and random forest,
and shows comparable performance to support vector regression on UCI ML
datasets. The constrained CLR approach achieves the best performance on the
health insurance dataset, while enjoying only times increased
computational time over linear regression
An Entity Resolution approach to isolate instances of Human Trafficking online
Human trafficking is a challenging law enforcement problem, and a large
amount of such activity manifests itself on various online forums. Given the
large, heterogeneous and noisy structure of this data, building models to
predict instances of trafficking is an even more convolved a task. In this
paper we propose and entity resolution pipeline using a notion of proxy labels,
in order to extract clusters from this data with prior history of human
trafficking activity. We apply this pipeline to 5M records from backpage.com
and report on the performance of this approach, challenges in terms of
scalability, and some significant domain specific characteristics of our
resolved entities
Lass-0: sparse non-convex regression by local search
We compute approximate solutions to L0 regularized linear regression using L1
regularization, also known as the Lasso, as an initialization step. Our
algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise
search to determine a locally optimal L0 solution given any L1 regularization
solution. We present theoretical results of consistency under orthogonality and
appropriate handling of redundant features. Empirically, we use synthetic data
to demonstrate that Lass-0 solutions are closer to the true sparse support than
L1 regularization models. Additionally, in real-world data Lass-0 finds more
parsimonious solutions than L1 regularization while maintaining similar
predictive accuracy.Comment: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015
- β¦