174 research outputs found

    Gravity as a Gauge Theory on Three-Dimensional Noncommutative spaces

    Full text link
    We plan to translate the successful description of three-dimensional gravity as a gauge theory in the noncommutative framework, making use of the covariant coordinates. We consider two specific three-dimensional fuzzy spaces based on SU(2) and SU(1,1), which carry appropriate symmetry groups. These are the groups we are going to gauge in order to result with the transformations of the gauge fields (dreibein, spin connection and two extra Maxwell fields due to noncommutativity), their corresponding curvatures and eventually determine the action and the equations of motion. Finally, we verify their connection to three-dimensional gravity.Comment: arXiv admin note: text overlap with arXiv:1802.0755

    The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment

    Get PDF
    To assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called confusion matrix, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix in a unique score, researchers and statisticians have developed several rates and metrics. In the past, several scientific studies already showed why the Matthews correlation coefficient (MCC) is more informative and trustworthy than confusion-entropy error, accuracy, F1 score, bookmaker informedness, markedness, and balanced accuracy. In this study, we compare the MCC with the diagnostic odds ratio (DOR), a statistical rate employed sometimes in biomedical sciences. After examining the properties of the MCC and of the DOR, we describe the relationships between them, by also taking advantage of an innovative geometrical plot called confusion tetrahedron, presented here for the first time. We then report some use cases where the MCC and the DOR produce discordant outcomes, and explain why the Matthews correlation coefficient is more informative and reliable between the two. Our results can have a strong impact in computer science and statistics, because they clearly explain why the trustworthiness of the information provided by the Matthews correlation coefficient is higher than the one generated by the diagnostic odds ratio

    A machine learning pipeline for quantitative phenotype prediction from genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quantitative phenotypes emerge everywhere in systems biology and biomedicine due to a direct interest for quantitative traits, or to high individual variability that makes hard or impossible to classify samples into distinct categories, often the case with complex common diseases. Machine learning approaches to genotype-phenotype mapping may significantly improve Genome-Wide Association Studies (GWAS) results by explicitly focusing on predictivity and optimal feature selection in a multivariate setting. It is however essential that stringent and well documented Data Analysis Protocols (DAP) are used to control sources of variability and ensure reproducibility of results. We present a genome-to-phenotype pipeline of machine learning modules for quantitative phenotype prediction. The pipeline can be applied for the direct use of whole-genome information in functional studies. As a realistic example, the problem of fitting complex phenotypic traits in heterogeneous stock mice from single nucleotide polymorphims (SNPs) is here considered.</p> <p>Methods</p> <p>The core element in the pipeline is the L1L2 regularization method based on the naïve elastic net. The method gives at the same time a regression model and a dimensionality reduction procedure suitable for correlated features. Model and SNP markers are selected through a DAP originally developed in the MAQC-II collaborative initiative of the U.S. FDA for the identification of clinical biomarkers from microarray data. The L1L2 approach is compared with standard Support Vector Regression (SVR) and with Recursive Jump Monte Carlo Markov Chain (MCMC). Algebraic indicators of stability of partial lists are used for model selection; the final panel of markers is obtained by a procedure at the chromosome scale, termed ’saturation’, to recover SNPs in Linkage Disequilibrium with those selected.</p> <p>Results</p> <p>With respect to both MCMC and SVR, comparable accuracies are obtained by the L1L2 pipeline. Good agreement is also found between SNPs selected by the L1L2 algorithms and candidate loci previously identified by a standard GWAS. The combination of L1L2-based feature selection with a saturation procedure tackles the issue of neglecting highly correlated features that affects many feature selection algorithms.</p> <p>Conclusions</p> <p>The L1L2 pipeline has proven effective in terms of marker selection and prediction accuracy. This study indicates that machine learning techniques may support quantitative phenotype prediction, provided that adequate DAPs are employed to control bias in model selection.</p

    Supervised classification of combined copy number and gene expression data

    Get PDF
    Summary In this paper we apply a predictive profiling method to genome copy number aberrations (CNA) in combination with gene expression and clinical data to identify molecular patterns of cancer pathophysiology. Predictive models and optimal feature lists for the platforms are developed by a complete validation SVM-based machine learning system. Ranked list of genome CNA sites (assessed by comparative genomic hybridization arrays – aCGH) and of differentially expressed genes (assessed by microarray profiling with Affy HG-U133A chips) are computed and combined on a breast cancer dataset for the discrimination of Luminal/ ER+ (Lum/ER+) and Basal-like/ER- classes. Different encodings are developed and applied to the CNA data, and predictive variable selection is discussed. We analyze the combination of profiling information between the platforms, also considering the pathophysiological data. A specific subset of patients is identified that has a different response to classification by chromosomal gains and losses and by differentially expressed genes, corroborating the idea that genomic CNA can represent an independent source for tumor classification

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

    Get PDF
    MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results

    Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

    Get PDF
    Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat

    Neuromechanical and environment aware machine learning tool for human locomotion intent recognition

    Get PDF
    Current research suggests the emergent need to recognize and predict locomotion modes (LMs) and LM transitions to allow a natural and smooth response of lower limb active assistive devices such as prostheses and orthosis for daily life locomotion assistance. This Master dissertation proposes an automatic and user-independent recognition and prediction tool based on machine learning methods. Further, it seeks to determine the gait measures that yielded the best performance in recognizing and predicting several human daily performed LMs and respective LM transitions. The machine learning framework was established using a Gaussian support vector machine (SVM) and discriminative features estimated from three wearable sensors, namely, inertial, force and laser sensors. In addition, a neuro-biomechanical model was used to compute joint angles and muscle activations that were fused with the sensor-based features. Results showed that combining biomechanical features from the Xsens with environment-aware features from the laser sensor resulted in the best recognition and prediction of LM (MCC = 0.99 and MCC = 0.95) and LM transitions (MCC = 0.96 and MCC = 0.98). Moreover, the predicted LM transitions were determined with high prediction time since their detection happened one or more steps before the LM transition occurrence. The developed framework has potential to improve the assistance delivered by locomotion assistive devices to achieve a more natural and smooth motion assistance.This work has been supported in part by the Fundação para a Ciência e Tecnologia (FCT) with the Reference Scholarship under Grant SFRH/BD/108309/2015, and part by the FEDER Funds through the Programa Operacional Regional do Norte and national funds from FCT with the project SmartOs -Controlo Inteligente de um Sistema Ortótico Ativo e Autónomo- under Grant NORTE-01-0145-FEDER-030386, and by the FEDER Funds through the COMPETE 2020—Programa Operacional Competitividade e Internacionalização (POCI)—with the Reference Project under Grant POCI-01-0145-FEDER-006941
    corecore