66,266 research outputs found

    The Randomized Dependence Coefficient

    Full text link
    We introduce the Randomized Dependence Coefficient (RDC), a measure of non-linear dependence between random variables of arbitrary dimension based on the Hirschfeld-Gebelein-R\'enyi Maximum Correlation Coefficient. RDC is defined in terms of correlation of random non-linear copula projections; it is invariant with respect to marginal distribution transformations, has low computational cost and is easy to implement: just five lines of R code, included at the end of the paper

    Characterizing the network topology of the energy landscapes of atomic clusters

    Full text link
    By dividing potential energy landscapes into basins of attractions surrounding minima and linking those basins that are connected by transition state valleys, a network description of energy landscapes naturally arises. These networks are characterized in detail for a series of small Lennard-Jones clusters and show behaviour characteristic of small-world and scale-free networks. However, unlike many such networks, this topology cannot reflect the rules governing the dynamics of network growth, because they are static spatial networks. Instead, the heterogeneity in the networks stems from differences in the potential energy of the minima, and hence the hyperareas of their associated basins of attraction. The low-energy minima with large basins of attraction act as hubs in the network.Comparisons to randomized networks with the same degree distribution reveals structuring in the networks that reflects their spatial embedding.Comment: 14 pages, 11 figure

    From Review to Rating: Exploring Dependency Measures for Text Classification

    Full text link
    Various text analysis techniques exist, which attempt to uncover unstructured information from text. In this work, we explore using statistical dependence measures for textual classification, representing text as word vectors. Student satisfaction scores on a 3-point scale and their free text comments written about university subjects are used as the dataset. We have compared two textual representations: a frequency word representation and term frequency relationship to word vectors, and found that word vectors provide a greater accuracy. However, these word vectors have a large number of features which aggravates the burden of computational complexity. Thus, we explored using a non-linear dependency measure for feature selection by maximizing the dependence between the text reviews and corresponding scores. Our quantitative and qualitative analysis on a student satisfaction dataset shows that our approach achieves comparable accuracy to the full feature vector, while being an order of magnitude faster in testing. These text analysis and feature reduction techniques can be used for other textual data applications such as sentiment analysis.Comment: 8 page

    Selecting the primary endpoint in a randomized clinical trial: the ARE method

    Get PDF
    The decision on the primary endpoint in a randomized clinical trial is of paramount importance and the combination of several endpoints might be a reasonable choice. Gómez and Lagakos (2013) have developed a method that quantifies how much more efficient it could be to use a composite instead of an individual relevant endpoint. From the information provided by the frequencies of observing the component endpoints in the control group and by the relative treatment effects on each individual endpoint, the asymptotic relative efficiency (ARE) can be computed. This article presents the applicability of the ARE method as a practical and objective tool to evaluate which components, among the plausible ones, are more efficient in the construction of the primary endpoint. The method is illustrated with two real cardiovascular clinical trials and is extended to allow for different dependence structures between the times to the individual endpoints. The influence of this choice on the recommendation on whether or not to use the composite endpoint as the primary endpoint for the investigation is studied. We conclude that the recommendation between using the composite or the relevant endpoint only depends on the frequencies of the endpoints and the relative effects of the treatment.Peer ReviewedPostprint (author's final draft

    Self-similar disk packings as model spatial scale-free networks

    Full text link
    The network of contacts in space-filling disk packings, such as the Apollonian packing, are examined. These networks provide an interesting example of spatial scale-free networks, where the topology reflects the broad distribution of disk areas. A wide variety of topological and spatial properties of these systems are characterized. Their potential as models for networks of connected minima on energy landscapes is discussed.Comment: 13 pages, 12 figures; some bugs fixed and further discussion of higher-dimensional packing

    Extension of the asymptotic relative efficiency method to select the primary endpoint in a randomized clinical trial

    Get PDF
    We extend the ARE method proposed in Gómez and Lagakos (2013) devised to decide which primary endpoint to choose when comparing two treatments in a randomized clinical trial. The ARE method is based on the Asymptotic Relative Efficiency (ARE) between two logrank tests to compare two treatments: one is based on a relevant endpoint E1 while the other is based on a composite endpoint E* = E1 ¿ E2, where E2 is an additional endpoint. The ARE depends, besides some intuitive parameters, on the joint law of the times T1 and T2 from randomization to E1 and E2, respectively. Gómez and Lagakos (2013) characterize this joint law by means of Frank’s copula. In our work, several families of copulas can be chosen for the bivariate survival function of (T1, T2) so that different dependence struc- tures between T1 and T2 are feasible. We motivate the problem and show how to apply the method through a real cardiovascular clinical trial. We explore the influence of the copula chosen into the ARE value by means of a simulation study. We conclude that the recommendation on whether or not to use the composite endpoint as the primary endpoint for the investigation is, almost always, independent of the copula chosen.Preprin
    corecore