29,345 research outputs found

    A review of random matrix theory with an application to biological data

    Get PDF
    Random matrix theory (RMT) is an area of study that has applications in a wide variety of scientific disciplines. The foundation of RMT is based on the analysis of the eigenvalue behavior of matrices. The eigenvalues of a random matrix (a matrix with stochastic entries) will behave differently than the eigenvalues from a matrix with non-random properties. Studying this bifurcation of the eigenvalue behavior provides the means to which system-specific signals can be distinguished from randomness. In particular, RMT provides an algorithmic approach to objectively remove noise from matrices with embedded signals. Major advances in data acquisition capabilities have changed the way research is conducted in many fields. Biological sciences have been revolutionized with the advent of high-throughput techniques that enable genome-wide measurements and a systems-level approach to biology. These new techniques are very promising, yet they produce a massive influx of data, which presents unique data processing challenges. A major task researchers are confronted with is how to properly filter out inherent noise from the data, while not losing valuable information. Studies have shown that RMT is an effective method to objectively process biological data. In this thesis, the underpinnings of RMT are explained and the function of the RMT algorithm used for data filtering is described. A survey of network analysis tools is also included as a way to provide insight on how to begin a rigorous, mathematical analysis of networks. Furthermore, the results of applying the RMT algorithm to a set of miRNA data extracted from the Bos taurus (domestic cow) are provided. The results of applying the RMT algorithm to the data are provided along with an implementation of the resulting network into a network analysis tool. These preliminary results demonstrate the facility of RMT coupled with network analysis tools as a basis for biological discovery --Abstract, page iii

    Uncovering missing links with cold ends

    Get PDF
    To evaluate the performance of prediction of missing links, the known data are randomly divided into two parts, the training set and the probe set. We argue that this straightforward and standard method may lead to terrible bias, since in real biological and information networks, missing links are more likely to be links connecting low-degree nodes. We therefore study how to uncover missing links with low-degree nodes, namely links in the probe set are of lower degree products than a random sampling. Experimental analysis on ten local similarity indices and four disparate real networks reveals a surprising result that the Leicht-Holme-Newman index [E. A. Leicht, P. Holme, and M. E. J. Newman, Phys. Rev. E 73, 026120 (2006)] performs the best, although it was known to be one of the worst indices if the probe set is a random sampling of all links. We further propose an parameter-dependent index, which considerably improves the prediction accuracy. Finally, we show the relevance of the proposed index on three real sampling methods.Comment: 16 pages, 5 figures, 6 table
    • …
    corecore