17 research outputs found
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
Automatic Parameter Selection for Non-Redundant Clustering
High-dimensional datasets often contain multiple meaningful clusterings in
different subspaces. For example, objects can be clustered either by color,
weight, or size, revealing different interpretations of the given dataset. A
variety of approaches are able to identify such non-redundant clusterings.
However, most of these methods require the user to specify the expected number
of subspaces and clusters for each subspace. Stating these values is a
non-trivial problem and usually requires detailed knowledge of the input
dataset. In this paper, we propose a framework that utilizes the Minimum
Description Length Principle (MDL) to detect the number of subspaces and
clusters per subspace automatically. We describe an efficient procedure that
greedily searches the parameter space by splitting and merging subspaces and
clusters within subspaces. Additionally, an encoding strategy is introduced
that allows us to detect outliers in each subspace. Extensive experiments show
that our approach is highly competitive to state-of-the-art methods