107 research outputs found

    A unifying view for performance measures in multi-class prediction

    Get PDF
    In the last few years, many different performance measures have been introduced to overcome the weakness of the most natural metric, the Accuracy. Among them, Matthews Correlation Coefficient has recently gained popularity among researchers not only in machine learning but also in several application fields such as bioinformatics. Nonetheless, further novel functions are being proposed in literature. We show that Confusion Entropy, a recently introduced classifier performance measure for multi-class problems, has a strong (monotone) relation with the multi-class generalization of a classical metric, the Matthews Correlation Coefficient. Computational evidence in support of the claim is provided, together with an outline of the theoretical explanation

    Stability Indicators in Network Reconstruction

    Full text link
    The number of algorithms available to reconstruct a biological network from a dataset of high-throughput measurements is nowadays overwhelming, but evaluating their performance when the gold standard is unknown is a difficult task. Here we propose to use a few reconstruction stability tools as a quantitative solution to this problem. We introduce four indicators to quantitatively assess the stability of a reconstructed network in terms of variability with respect to data subsampling. In particular, we give a measure of the mutual distances among the set of networks generated by a collection of data subsets (and from the network generated on the whole dataset) and we rank nodes and edges according to their decreasing variability within the same set of networks. As a key ingredient, we employ a global/local network distance combined with a bootstrap procedure. We demonstrate the use of the indicators in a controlled situation on a toy dataset, and we show their application on a miRNA microarray dataset with paired tumoral and non-tumoral tissues extracted from a cohort of 241 hepatocellular carcinoma patients

    The HIM glocal metric and kernel for network comparison and classification

    Full text link
    Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored ad-hoc for the investigated problem. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming distance and the global spectral Ipsen-Mikhailov distance so to overcome the drawbacks affecting the two components separately. Building then the HIM kernel function derived from the HIM distance it is possible to move from network comparison to network classification via the Support Vector Machine (SVM) algorithm. Applications of HIM distance and HIM kernel in computational biology and social networks science demonstrate the effectiveness of the proposed functions as a general purpose solution.Comment: Frontiers of Network Analysis: Methods, Models, and Applications - NIPS 2013 Worksho

    Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers

    Full text link
    We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties, and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large (n=1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. Availability and Implementation: Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).Comment: Bioinformatics 2012, in pres

    A Version of Jung’s Synchronicity in the Event of Correlation of Mental Processes in the Past and the Future: Possible Role of Quantum Entanglement in Quantum Vacuum

    Get PDF
    This paper deals with the version of Jung’s synchronicity in which correlation between mental processes of two different persons takes place not just in the case when at a certain moment of time the subjects are located at a distance from each other, but also in the case when both persons are alternately (and sequentially, one after the other) located in the same point of space. In this case, a certain period of time lapses between manifestation of mental process in one person and manifestation of mental process in the other person. Transmission of information from one person to the other via classical communication channel is ruled out. The author proposes a hypothesis, whereby such manifestation of synchronicity may become possible thanks to existence of quantum entanglement between the past and the future within the light cone. This hypothesis is based on the latest perception of the nature of quantum vacuu

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset
    • …
    corecore