29 research outputs found

    An overview of data integration in neuroscience with focus on Alzheimer's Disease

    Get PDF
    : This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities. Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis

    New miRNA Signature Heralds Human NK Cell Subsets at Different Maturation Steps: Involvement of miR-146a-5p in the Regulation of KIR Expression

    Get PDF
    Natural killer cells are cytotoxic innate lymphoid cells that play an important role for early host defenses against infectious pathogens and surveillance against tumor. In humans, NK cells may be divided in various subsets on the basis of the relative CD56 expression and of the low-affinity FcγRIIIA CD16. In particular, the two main NK cell subsets are represented by the CD56bright/CD16−/dim and the CD56dim/CD16bright NK cells. Experimental evidences indicate that CD56bright and CD56dim NK cells represent different maturative stages of the NK cell developmental pathway. We identified multiple miRNAs differentially expressed in CD56bright/CD16− and CD56dim/CD16bright NK cells using both univariate and multivariate analyses. Among these, we found a few miRNAs with a consistent differential expression in the two NK cell subsets, and with an intermediate expression in the CD56bright/CD16dim NK cell subset, representing a transitional step of maturation of NK cells. These analyses allowed us to establish the existence of a miRNA signature able to efficiently discriminate the two main NK cell subsets regardless of their surface phenotype. In addition, by analyzing the putative targets of representative miRNAs we show that hsa-miR-146a-5p, may be involved in the regulation of killer Ig-like receptor (KIR) expression. These results contribute to a better understanding of the physiologic significance of miRNAs in the regulation of the development/function of human NK cells. Moreover, our results suggest that hsa-miR-146a-5p targeting, resulting in KIR down-regulation, may be exploited to generate/increment the effect of NK KIR-mismatching against HLA-class I+ tumor cells and thus improve the NK-mediated anti-tumor activity

    Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

    Get PDF
    MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results

    A computational procedure for functional characterization of potential marker genes from molecular data: Alzheimer's as a case study

    Get PDF
    Abstract Background A molecular characterization of Alzheimer's Disease (AD) is the key to the identification of altered gene sets that lead to AD progression. We rely on the assumption that candidate marker genes for a given disease belong to specific pathogenic pathways, and we aim at unveiling those pathways stable across tissues, treatments and measurement systems. In this context, we analyzed three heterogeneous datasets, two microarray gene expression sets and one protein abundance set, applying a recently proposed feature selection method based on regularization. Results For each dataset we identified a signature that was successively evaluated both from the computational and functional characterization viewpoints, estimating the classification error and retrieving the most relevant biological knowledge from different repositories. Each signature includes genes already known to be related to AD and genes that are likely to be involved in the pathogenesis or in the disease progression. The integrated analysis revealed a meaningful overlap at the functional level. Conclusions The identification of three gene signatures showing a relevant overlap of pathways and ontologies, increases the likelihood of finding potential marker genes for AD.</p

    Adenine: A HPC-oriented tool for biological data exploration

    No full text
    adenine is a machine learning framework designed for biological data exploration and visualization. Its goal is to help bioinformaticians achieving a first and quick overview of the main structures underlying their data. This software tool encompasses state-of-the-art techniques for missing values imputing, data preprocessing, dimensionality reduction and clustering. adenine has a scalable architecture which seamlessly work on single workstations as well as on high-performance computing facilities. adenine is capable of generating publication-ready plots along with quantitative descriptions of the results. In this paper we provide an example of exploratory analysis on a publicly available gene expression data set of colorectal cancer samples. The software and its documentation are available at https://github.com/slipguru/adenine under FreeBSD license

    Enhancing Interpretability of Gene Signatures with Prior Biological Knowledge

    No full text
    Biological interpretability is a key requirement for the output of microarray data analysis pipelines. The most used pipeline first identifies a gene signature from the acquired measurements and then uses gene enrichment analysis as a tool for functionally characterizing the obtained results. Recently Knowledge Driven Variable Selection (KDVS), an alternative approach which performs both steps at the same time, has been proposed. In this paper, we assess the effectiveness of KDVS against standard approaches on a Parkinson’s Disease (PD) dataset. The presented quantitative analysis is made possible by the construction of a reference list of genes and gene groups associated to PD. Our work shows that KDVS is much more effective than the standard approach in enhancing the interpretability of the obtained results
    corecore