35,050 research outputs found
The DD-classifier in the functional setting
The Maximum Depth was the first attempt to use data depths instead of
multivariate raw data to construct a classification rule. Recently, the
DD-classifier has solved several serious limitations of the Maximum Depth
classifier but some issues still remain. This paper is devoted to extending the
DD-classifier in the following ways: first, to surpass the limitation of the
DD-classifier when more than two groups are involved. Second to apply regular
classification methods (like NN, linear or quadratic classifiers, recursive
partitioning,...) to DD-plots to obtain useful insights through the diagnostics
of these methods. And third, to integrate different sources of information
(data depths or multivariate functional data) in a unified way in the
classification procedure. Besides, as the DD-classifier trick is especially
useful in the functional framework, an enhanced revision of several functional
data depths is done in the paper. A simulation study and applications to some
classical real datasets are also provided showing the power of the new
proposal.Comment: 29 pages, 6 figures, 6 tables, Supplemental R Code and Dat
Ball: An R package for detecting distribution difference and association in metric spaces
The rapid development of modern technology facilitates the appearance of
numerous unprecedented complex data which do not satisfy the axioms of
Euclidean geometry, while most of the statistical hypothesis tests are
available in Euclidean or Hilbert spaces. To properly analyze the data of more
complicated structures, efforts have been made to solve the fundamental test
problems in more general spaces. In this paper, a publicly available R package
Ball is provided to implement Ball statistical test procedures for K-sample
distribution comparison and test of mutual independence in metric spaces, which
extend the test procedures for two sample distribution comparison and test of
independence. The tailormade algorithms as well as engineering techniques are
employed on the Ball package to speed up computation to the best of our
ability. Two real data analyses and several numerical studies have been
performed and the results certify the powerfulness of Ball package in analyzing
complex data, e.g., spherical data and symmetric positive matrix data
Network depth: identifying median and contours in complex networks
Centrality descriptors are widely used to rank nodes according to specific
concept(s) of importance. Despite the large number of centrality measures
available nowadays, it is still poorly understood how to identify the node
which can be considered as the `centre' of a complex network. In fact, this
problem corresponds to finding the median of a complex network. The median is a
non-parametric and robust estimator of the location parameter of a probability
distribution. In this work, we present the most natural generalisation of the
concept of median to the realm of complex networks, discussing its advantages
for defining the centre of the system and percentiles around that centre. To
this aim, we introduce a new statistical data depth and we apply it to networks
embedded in a geometric space induced by different metrics. The application of
our framework to empirical networks allows us to identify median nodes which
are socially or biologically relevant
A statistical reduced-reference method for color image quality assessment
Although color is a fundamental feature of human visual perception, it has
been largely unexplored in the reduced-reference (RR) image quality assessment
(IQA) schemes. In this paper, we propose a natural scene statistic (NSS)
method, which efficiently uses this information. It is based on the statistical
deviation between the steerable pyramid coefficients of the reference color
image and the degraded one. We propose and analyze the multivariate generalized
Gaussian distribution (MGGD) to model the underlying statistics. In order to
quantify the degradation, we develop and evaluate two measures based
respectively on the Geodesic distance between two MGGDs and on the closed-form
of the Kullback Leibler divergence. We performed an extensive evaluation of
both metrics in various color spaces (RGB, HSV, CIELAB and YCrCb) using the TID
2008 benchmark and the FRTV Phase I validation process. Experimental results
demonstrate the effectiveness of the proposed framework to achieve a good
consistency with human visual perception. Furthermore, the best configuration
is obtained with CIELAB color space associated to KLD deviation measure
Change Point Methods on a Sequence of Graphs
Given a finite sequence of graphs, e.g., coming from technological,
biological, and social networks, the paper proposes a methodology to identify
possible changes in stationarity in the stochastic process generating the
graphs. In order to cover a large class of applications, we consider the
general family of attributed graphs where both topology (number of vertexes and
edge configuration) and related attributes are allowed to change also in the
stationary case. Novel Change Point Methods (CPMs) are proposed, that (i) map
graphs into a vector domain; (ii) apply a suitable statistical test in the
vector space; (iii) detect the change --if any-- according to a confidence
level and provide an estimate for its time occurrence. Two specific
multivariate CPMs have been designed: one that detects shifts in the
distribution mean, the other addressing generic changes affecting the
distribution. We ground our proposal with theoretical results showing how to
relate the inference attained in the numerical vector space to the graph
domain, and vice versa. We also show how to extend the methodology for handling
multiple change points in the same sequence. Finally, the proposed CPMs have
been validated on real data sets coming from epileptic-seizure detection
problems and on labeled data sets for graph classification. Results show the
effectiveness of what proposed in relevant application scenarios
The affinely invariant distance correlation
Sz\'{e}kely, Rizzo and Bakirov (Ann. Statist. 35 (2007) 2769-2794) and
Sz\'{e}kely and Rizzo (Ann. Appl. Statist. 3 (2009) 1236-1265), in two seminal
papers, introduced the powerful concept of distance correlation as a measure of
dependence between sets of random variables. We study in this paper an affinely
invariant version of the distance correlation and an empirical version of that
distance correlation, and we establish the consistency of the empirical
quantity. In the case of subvectors of a multivariate normally distributed
random vector, we provide exact expressions for the affinely invariant distance
correlation in both finite-dimensional and asymptotic settings, and in the
finite-dimensional case we find that the affinely invariant distance
correlation is a function of the canonical correlation coefficients. To
illustrate our results, we consider time series of wind vectors at the
Stateline wind energy center in Oregon and Washington, and we derive the
empirical auto and cross distance correlation functions between wind vectors at
distinct meteorological stations.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ558 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Regression with Distance Matrices
Data types that lie in metric spaces but not in vector spaces are difficult
to use within the usual regression setting, either as the response and/or a
predictor. We represent the information in these variables using distance
matrices which requires only the specification of a distance function. A
low-dimensional representation of such distance matrices can be obtained using
methods such as multidimensional scaling. Once these variables have been
represented as scores, an internal model linking the predictors and the
response can be developed using standard methods. We call scoring the
transformation from a new observation to a score while backscoring is a method
to represent a score as an observation in the data space. Both methods are
essential for prediction and explanation. We illustrate the methodology for
shape data, unregistered curve data and correlation matrices using motion
capture data from an experiment to study the motion of children with cleft lip.Comment: 18 pages, 7 figure
- …