66,266 research outputs found
The Randomized Dependence Coefficient
We introduce the Randomized Dependence Coefficient (RDC), a measure of
non-linear dependence between random variables of arbitrary dimension based on
the Hirschfeld-Gebelein-R\'enyi Maximum Correlation Coefficient. RDC is defined
in terms of correlation of random non-linear copula projections; it is
invariant with respect to marginal distribution transformations, has low
computational cost and is easy to implement: just five lines of R code,
included at the end of the paper
Characterizing the network topology of the energy landscapes of atomic clusters
By dividing potential energy landscapes into basins of attractions
surrounding minima and linking those basins that are connected by transition
state valleys, a network description of energy landscapes naturally arises.
These networks are characterized in detail for a series of small Lennard-Jones
clusters and show behaviour characteristic of small-world and scale-free
networks. However, unlike many such networks, this topology cannot reflect the
rules governing the dynamics of network growth, because they are static spatial
networks. Instead, the heterogeneity in the networks stems from differences in
the potential energy of the minima, and hence the hyperareas of their
associated basins of attraction. The low-energy minima with large basins of
attraction act as hubs in the network.Comparisons to randomized networks with
the same degree distribution reveals structuring in the networks that reflects
their spatial embedding.Comment: 14 pages, 11 figure
From Review to Rating: Exploring Dependency Measures for Text Classification
Various text analysis techniques exist, which attempt to uncover unstructured
information from text. In this work, we explore using statistical dependence
measures for textual classification, representing text as word vectors. Student
satisfaction scores on a 3-point scale and their free text comments written
about university subjects are used as the dataset. We have compared two textual
representations: a frequency word representation and term frequency
relationship to word vectors, and found that word vectors provide a greater
accuracy. However, these word vectors have a large number of features which
aggravates the burden of computational complexity. Thus, we explored using a
non-linear dependency measure for feature selection by maximizing the
dependence between the text reviews and corresponding scores. Our quantitative
and qualitative analysis on a student satisfaction dataset shows that our
approach achieves comparable accuracy to the full feature vector, while being
an order of magnitude faster in testing. These text analysis and feature
reduction techniques can be used for other textual data applications such as
sentiment analysis.Comment: 8 page
Selecting the primary endpoint in a randomized clinical trial: the ARE method
The decision on the primary endpoint in a randomized clinical trial is of paramount importance and the combination of several endpoints might be a reasonable choice. Gómez and Lagakos (2013) have developed a method that quantifies how much more efficient it could be to use a composite instead of an individual relevant endpoint. From the information provided by the frequencies of observing the component endpoints in the control group and by the relative treatment effects on each individual endpoint, the asymptotic relative efficiency (ARE) can be computed. This article presents the applicability of the ARE method as a practical and objective tool to evaluate which components, among the plausible ones, are more efficient in the construction of the primary endpoint. The method is illustrated with two real cardiovascular clinical trials and is extended to allow for different dependence structures between the times to the individual endpoints. The influence of this choice on the recommendation on whether or not to use the composite endpoint as the primary endpoint for the investigation is studied. We conclude that the recommendation between using the composite or the relevant endpoint only depends on the frequencies of the endpoints and the relative effects of the treatment.Peer ReviewedPostprint (author's final draft
Self-similar disk packings as model spatial scale-free networks
The network of contacts in space-filling disk packings, such as the
Apollonian packing, are examined. These networks provide an interesting example
of spatial scale-free networks, where the topology reflects the broad
distribution of disk areas. A wide variety of topological and spatial
properties of these systems are characterized. Their potential as models for
networks of connected minima on energy landscapes is discussed.Comment: 13 pages, 12 figures; some bugs fixed and further discussion of
higher-dimensional packing
Extension of the asymptotic relative efficiency method to select the primary endpoint in a randomized clinical trial
We extend the ARE method proposed in Gómez and Lagakos (2013) devised to decide which primary endpoint to choose when comparing two treatments in a randomized clinical trial. The ARE method is
based on the Asymptotic Relative Efficiency (ARE) between two logrank tests to compare two treatments: one is based on a relevant endpoint E1 while the other is based on a composite endpoint E* = E1 ¿ E2, where E2 is an additional endpoint. The ARE depends, besides some intuitive parameters, on the joint law of the times T1 and T2 from randomization to E1 and E2, respectively. Gómez and Lagakos (2013) characterize this joint law by means of Frank’s copula. In our work, several families of copulas can be chosen for the bivariate survival function of (T1, T2) so that different dependence struc- tures between T1 and T2 are feasible. We motivate the problem and show how to apply the method through a real cardiovascular clinical trial. We explore the influence of the
copula chosen into the ARE value by means of a simulation study. We conclude that the recommendation on whether or not to use
the composite endpoint as the primary endpoint for the investigation is, almost always, independent of the copula chosen.Preprin
- …