3,760 research outputs found
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
We present the Bayesian Case Model (BCM), a general framework for Bayesian
case-based reasoning (CBR) and prototype classification and clustering. BCM
brings the intuitive power of CBR to a Bayesian generative framework. The BCM
learns prototypes, the "quintessential" observations that best represent
clusters in a dataset, by performing joint inference on cluster labels,
prototypes and important features. Simultaneously, BCM pursues sparsity by
learning subspaces, the sets of features that play important roles in the
characterization of the prototypes. The prototype and subspace representation
provides quantitative benefits in interpretability while preserving
classification accuracy. Human subject experiments verify statistically
significant improvements to participants' understanding when using explanations
produced by BCM, compared to those given by prior art.Comment: Published in Neural Information Processing Systems (NIPS) 2014,
Neural Information Processing Systems (NIPS) 201
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
Scalable and Robust Community Detection with Randomized Sketching
This paper explores and analyzes the unsupervised clustering of large
partially observed graphs. We propose a scalable and provable randomized
framework for clustering graphs generated from the stochastic block model. The
clustering is first applied to a sub-matrix of the graph's adjacency matrix
associated with a reduced graph sketch constructed using random sampling. Then,
the clusters of the full graph are inferred based on the clusters extracted
from the sketch using a correlation-based retrieval step. Uniform random node
sampling is shown to improve the computational complexity over clustering of
the full graph when the cluster sizes are balanced. A new random degree-based
node sampling algorithm is presented which significantly improves upon the
performance of the clustering algorithm even when clusters are unbalanced. This
algorithm improves the phase transitions for matrix-decomposition-based
clustering with regard to computational complexity and minimum cluster size,
which are shown to be nearly dimension-free in the low inter-cluster
connectivity regime. A third sampling technique is shown to improve balance by
randomly sampling nodes based on spatial distribution. We provide analysis and
numerical results using a convex clustering algorithm based on matrix
completion
Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes
We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio
Discovering an active subspace in a single-diode solar cell model
Predictions from science and engineering models depend on the values of the
model's input parameters. As the number of parameters increases, algorithmic
parameter studies like optimization or uncertainty quantification require many
more model evaluations. One way to combat this curse of dimensionality is to
seek an alternative parameterization with fewer variables that produces
comparable predictions. The active subspace is a low-dimensional linear
subspace defined by important directions in the model's input space; input
perturbations along these directions change the model's prediction more, on
average, than perturbations orthogonal to the important directions. We describe
a method for checking if a model admits an exploitable active subspace, and we
apply this method to a single-diode solar cell model with five input
parameters. We find that the maximum power of the solar cell has a dominant
one-dimensional active subspace, which enables us to perform thorough parameter
studies in one dimension instead of five
- …