15,193 research outputs found
Developments in the theory of randomized shortest paths with a comparison of graph node distances
There have lately been several suggestions for parametrized distances on a
graph that generalize the shortest path distance and the commute time or
resistance distance. The need for developing such distances has risen from the
observation that the above-mentioned common distances in many situations fail
to take into account the global structure of the graph. In this article, we
develop the theory of one family of graph node distances, known as the
randomized shortest path dissimilarity, which has its foundation in statistical
physics. We show that the randomized shortest path dissimilarity can be easily
computed in closed form for all pairs of nodes of a graph. Moreover, we come up
with a new definition of a distance measure that we call the free energy
distance. The free energy distance can be seen as an upgrade of the randomized
shortest path dissimilarity as it defines a metric, in addition to which it
satisfies the graph-geodetic property. The derivation and computation of the
free energy distance are also straightforward. We then make a comparison
between a set of generalized distances that interpolate between the shortest
path distance and the commute time, or resistance distance. This comparison
focuses on the applicability of the distances in graph node clustering and
classification. The comparison, in general, shows that the parametrized
distances perform well in the tasks. In particular, we see that the results
obtained with the free energy distance are among the best in all the
experiments.Comment: 30 pages, 4 figures, 3 table
Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map
In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison
Dealing with Label Switching in Mixture Models Under Genuine Multimodality
The fitting of finite mixture models is an ill-defined estimation problem as completely different parameterizations can induce similar mixture distributions. This leads to multiple modes in the likelihood which is a problem for frequentist maximum likelihood estimation, and complicates statistical inference of Markov chain Monte Carlo draws in Bayesian estimation. For the analysis of the posterior density of these draws a suitable separation into different modes is desirable. In addition, a unique labelling of the component specific estimates is necessary to solve the label
switching problem. This paper presents and compares two approaches to achieve these goals: relabelling under multimodality and constrained clustering. The algorithmic details are discussed and their application is demonstrated on artificial and real-world data
Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree
In biological experiments researchers often have information in the form of a
graph that supplements observed numerical data. Incorporating the knowledge
contained in these graphs into an analysis of the numerical data is an
important and nontrivial task. We look at the example of metagenomic
data---data from a genomic survey of the abundance of different species of
bacteria in a sample. Here, the graph of interest is a phylogenetic tree
depicting the interspecies relationships among the bacteria species. We
illustrate that analysis of the data in a nonstandard inner-product space
effectively uses this additional graphical information and produces more
meaningful results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS402 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …