2,557 research outputs found

    Diagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting

    Get PDF
    In this paper we establish links between, and new results for, three problems that are not usually considered together. The first is a matrix decomposition problem that arises in areas such as statistical modeling and signal processing: given a matrix XX formed as the sum of an unknown diagonal matrix and an unknown low rank positive semidefinite matrix, decompose XX into these constituents. The second problem we consider is to determine the facial structure of the set of correlation matrices, a convex set also known as the elliptope. This convex body, and particularly its facial structure, plays a role in applications from combinatorial optimization to mathematical finance. The third problem is a basic geometric question: given points v1,v2,...,vn∈Rkv_1,v_2,...,v_n\in \R^k (where n>kn > k) determine whether there is a centered ellipsoid passing \emph{exactly} through all of the points. We show that in a precise sense these three problems are equivalent. Furthermore we establish a simple sufficient condition on a subspace UU that ensures any positive semidefinite matrix LL with column space UU can be recovered from D+LD+L for any diagonal matrix DD using a convex optimization-based heuristic known as minimum trace factor analysis. This result leads to a new understanding of the structure of rank-deficient correlation matrices and a simple condition on a set of points that ensures there is a centered ellipsoid passing through them.Comment: 20 page

    Statistical significance of communities in networks

    Full text link
    Nodes in real-world networks are usually organized in local modules. These groups, called communities, are intuitively defined as sub-graphs with a larger density of internal connections than of external links. In this work, we introduce a new measure aimed at quantifying the statistical significance of single communities. Extreme and Order Statistics are used to predict the statistics associated with individual clusters in random graphs. These distributions allows us to define one community significance as the probability that a generic clustering algorithm finds such a group in a random graph. The method is successfully applied in the case of real-world networks for the evaluation of the significance of their communities.Comment: 9 pages, 8 figures, 2 tables. The software to calculate the C-score can be found at http://filrad.homelinux.org/cscor

    Classification of Message Spreading in a Heterogeneous Social Network

    Get PDF
    Nowadays, social networks such as Twitter, Facebook and LinkedIn become increasingly popular. In fact, they introduced new habits, new ways of communication and they collect every day several information that have different sources. Most existing research works fo-cus on the analysis of homogeneous social networks, i.e. we have a single type of node and link in the network. However, in the real world, social networks offer several types of nodes and links. Hence, with a view to preserve as much information as possible, it is important to consider so-cial networks as heterogeneous and uncertain. The goal of our paper is to classify the social message based on its spreading in the network and the theory of belief functions. The proposed classifier interprets the spread of messages on the network, crossed paths and types of links. We tested our classifier on a real word network that we collected from Twitter, and our experiments show the performance of our belief classifier

    The Weibull-Geometric distribution

    Full text link
    In this paper we introduce, for the first time, the Weibull-Geometric distribution which generalizes the exponential-geometric distribution proposed by Adamidis and Loukas (1998). The hazard function of the last distribution is monotone decreasing but the hazard function of the new distribution can take more general forms. Unlike the Weibull distribution, the proposed distribution is useful for modeling unimodal failure rates. We derive the cumulative distribution and hazard functions, the density of the order statistics and calculate expressions for its moments and for the moments of the order statistics. We give expressions for the R\'enyi and Shannon entropies. The maximum likelihood estimation procedure is discussed and an algorithm EM (Dempster et al., 1977; McLachlan and Krishnan, 1997) is provided for estimating the parameters. We obtain the information matrix and discuss inference. Applications to real data sets are given to show the flexibility and potentiality of the proposed distribution

    How Sample Completeness Affects Gamma-Ray Burst Classification

    Full text link
    Unsupervised pattern recognition algorithms support the existence of three gamma-ray burst classes; Class I (long, large fluence bursts of intermediate spectral hardness), Class II (short, small fluence, hard bursts), and Class III (soft bursts of intermediate durations and fluences). The algorithms surprisingly assign larger membership to Class III than to either of the other two classes. A known systematic bias has been previously used to explain the existence of Class III in terms of Class I; this bias allows the fluences and durations of some bursts to be underestimated (Hakkila et al., ApJ 538, 165, 2000). We show that this bias primarily affects only the longest bursts and cannot explain the bulk of the Class III properties. We resolve the question of Class III existence by demonstrating how samples obtained using standard trigger mechanisms fail to preserve the duration characteristics of small peak flux bursts. Sample incompleteness is thus primarily responsible for the existence of Class III. In order to avoid this incompleteness, we show how a new dual timescale peak flux can be defined in terms of peak flux and fluence. The dual timescale peak flux preserves the duration distribution of faint bursts and correlates better with spectral hardness (and presumably redshift) than either peak flux or fluence. The techniques presented here are generic and have applicability to the studies of other transient events. The results also indicate that pattern recognition algorithms are sensitive to sample completeness; this can influence the study of large astronomical databases such as those found in a Virtual Observatory.Comment: 29 pages, 6 figures, 3 tables, Accepted for publication in The Astrophysical Journa

    The Time Machine: A Simulation Approach for Stochastic Trees

    Full text link
    In the following paper we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated to such models. This typically consists of using importance sampling (IS) and sequential Monte Carlo (SMC) techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor (MRCA). However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.Comment: 22 Pages, 5 Figure

    Mixtures of nonparametric autoregressions

    Get PDF
    We consider data generating mechanisms which can be represented as mixtures of finitely many regression or autoregression models.We propose nonparametric estimators for the functions characterising the various mixture components based on a local quasi maximum likelihood approach and prove their consistency. We present an EM algorithm for calculating the estimates numerically which is mainly based on iteratively applying common local smoothers and discuss its convergence properties. © American Statistical Association and Taylor & Francis 2011.postprin

    Average of trial peaks versus peak of average profile : impact on change of direction biomechanics

    Get PDF
    The aims of this study were twofold: firstly, to compare lower limb kinematic and kinetic variables during a sprint and 90° cutting task between two averaging methods of obtaining discrete data (peak of average profile vs. average of individual trial peaks); secondly, to determine the effect of averaging methods on participant ranking of each variable within a group. Twenty-two participants, from multiple sports, performed a 90° cut, whereby lower limb kinematics and kinetics were assessed via 3D motion and ground reaction force (GRF) analysis. Six of the eight dependent variables (vertical and horizontal GRF; hip flexor, knee flexor, and knee abduction moments, and knee abduction angle) were significantly greater (p ≀ 0.001, g = 0.10-0.37, 2.74-10.40%) when expressed as an average of trial peaks compared to peak of average profiles. Trivial (g ≀ 0.04) and minimal differences (≀ 0.94%) were observed in peak hip and knee flexion angle between averaging methods. Very strong correlations (ρ ≄ 0.901, p <0.001) were observed for rankings of participants between averaging methods for all variables. Practitioners and researchers should obtain discrete data based on the average of trial peaks because it is not influenced by misalignments and variations in trial peak locations, in contrast to the peak from average profile

    Evolution of the X-ray Emission of Radio-Quiet Quasars

    Get PDF
    We report new Chandra observations of seven optically faint, z \sim 4 radio-quiet quasars. We have combined these new observations with previous Chandra observations of radio-quiet quasars to create a sample of 174 sources. These sources have 0.1 < z < 4.7, and 10^{44} ergs s^{-1} < nu L_{nu} (2500 \AA) < 10^{48} ergs s^{-1}. The X-ray detection fraction is 90%. We find that the X-ray loudness of radio-quiet quasars decreases with UV luminosity and increases with redshift. The model that is best supported by the data has a linear dependence of optical-to-X-ray ratio, alpha_{ox}, on cosmic time, and a quadratic dependence of alpha_{ox} on log L_{UV}, where alpha_{ox} becomes X-ray quiet more rapidly at higher log L_{UV}. We find no significant evidence for a relationship between the X-ray photon index, Gamma_X, and the UV luminosity, and we find marginally significant evidence that the X-ray continuum flattens with increasing z (2 sigma). The Gamma_X-z anti-correlation may be the result of X-ray spectral curvature, redshifting of a Compton reflection component into the observed Chandra band, and/or redshifting of a soft excess out of the observed Chandra band. Using the results for Gamma_X, we show that the alpha_{ox}-z relationship is unlikely to be a spurious result caused by redshifting of the observable X-ray spectral region. A correlation between alpha_{ox} and z implies evolution of the accretion process. We present a qualitative comparison of these new results with models for accretion disk emission.Comment: Accepted by ApJ, 48 pages, 10 figures, 5 table

    Adaptive Seeding for Gaussian Mixture Models

    Full text link
    We present new initialization methods for the expectation-maximization algorithm for multivariate Gaussian mixture models. Our methods are adaptions of the well-known KK-means++ initialization and the Gonzalez algorithm. Thereby we aim to close the gap between simple random, e.g. uniform, and complex methods, that crucially depend on the right choice of hyperparameters. Our extensive experiments indicate the usefulness of our methods compared to common techniques and methods, which e.g. apply the original KK-means++ and Gonzalez directly, with respect to artificial as well as real-world data sets.Comment: This is a preprint of a paper that has been accepted for publication in the Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2016. The final publication is available at link.springer.com (http://link.springer.com/chapter/10.1007/978-3-319-31750-2 24
    • 

    corecore