408 research outputs found
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
The Multivariate Watson Distribution: Maximum-Likelihood Estimation and other Aspects
This paper studies fundamental aspects of modelling data using multivariate
Watson distributions. Although these distributions are natural for modelling
axially symmetric data (i.e., unit vectors where \pm \x are equivalent), for
high-dimensions using them can be difficult. Why so? Largely because for Watson
distributions even basic tasks such as maximum-likelihood are numerically
challenging. To tackle the numerical difficulties some approximations have been
derived---but these are either grossly inaccurate in high-dimensions
(\emph{Directional Statistics}, Mardia & Jupp. 2000) or when reasonably
accurate (\emph{J. Machine Learning Research, W. & C.P., v2}, Bijral \emph{et
al.}, 2007, pp. 35--42), they lack theoretical justification. We derive new
approximations to the maximum-likelihood estimates; our approximations are
theoretically well-defined, numerically accurate, and easy to compute. We build
on our parameter estimation and discuss mixture-modelling with Watson
distributions; here we uncover a hitherto unknown connection to the
"diametrical clustering" algorithm of Dhillon \emph{et al.}
(\emph{Bioinformatics}, 19(13), 2003, pp. 1612--1619).Comment: 24 pages; extensively updated numerical result
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
- …