86 research outputs found

    A human–AI collaboration workflow for archaeological sites detection

    Get PDF
    This paper illustrates the results obtained by using pre-trained semantic segmentation deep learning models for the detection of archaeological sites within the Mesopotamian floodplains environment. The models were fine-tuned using openly available satellite imagery and vector shapes coming from a large corpus of annotations (i.e., surveyed sites). A randomized test showed that the best model reaches a detection accuracy in the neighborhood of 80%. Integrating domain expertise was crucial to define how to build the dataset and how to evaluate the predictions, since defining if a proposed mask counts as a prediction is very subjective. Furthermore, even an inaccurate prediction can be useful when put into context and interpreted by a trained archaeologist. Coming from these considerations we close the paper with a vision for a Human–AI collaboration workflow. Starting with an annotated dataset that is refined by the human expert we obtain a model whose predictions can either be combined to create a heatmap, to be overlaid on satellite and/or aerial imagery, or alternatively can be vectorized to make further analysis in a GIS software easier and automatic. In turn, the archaeologists can analyze the predictions, organize their onsite surveys, and refine the dataset with new, corrected, annotations

    Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine

    Get PDF
    Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of \u394\u394G values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of \u394\u394G values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases

    DDGun: An untrained method for the prediction of protein stability changes upon single and multiple point variations

    Get PDF
    Background: Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (\u3b4\u3b4G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted \u3b4\u3b4G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., \u3b4\u3b4G(A \u2192 B) = -\u3b4\u3b4G(B \u2192 A), where A and B are amino acids. Results: Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the \u3b4\u3b4G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured \u3b4\u3b4G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the \u3b4\u3b4G of a reciprocal variation as almost equal (depending on the sequence profile) to -\u3b4\u3b4G of the direct variation. This is a valuable property that is missing in the majority of the methods. Conclusions: Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of \u3b4\u3b4G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods

    A comparison of lysosomal enzymes expression levels in peripheral blood of mild- and severe-Alzheimer’s disease and MCI patients: implications for regenerative medicine approaches

    Get PDF
    The association of lysosomal dysfunction and neurodegeneration has been documented in several neurodegenerative diseases, including Alzheimer’s Disease (AD). Herein, we investigate the association of lysosomal enzymes with AD at different stages of progression of the disease (mild and severe) or with mild cognitive impairment (MCI). We conducted a screening of two classes of lysosomal enzymes: glycohydrolases (β-Hexosaminidase, β-Galctosidase, β-Galactosylcerebrosidase, β-Glucuronidase) and proteases (Cathepsins S, D, B, L) in peripheral blood samples (blood plasma and PBMCs) from mild AD, severe AD, MCI and healthy control subjects. We confirmed the lysosomal dysfunction in severe AD patients and added new findings enhancing the association of abnormal levels of specific lysosomal enzymes with the mild AD or severe AD, and highlighting the difference of AD from MCI. Herein, we showed for the first time the specific alteration of β-Galctosidase (Gal), β-Galactosylcerebrosidase (GALC) in MCI patients. It is notable that in above peripheral biological samples the lysosomes are more sensitive to AD cellular metabolic alteration when compared to levels of Aβ-peptide or Tau proteins, similar in both AD groups analyzed. Collectively, our findings support the role of lysosomal enzymes as potential peripheral molecules that vary with the progression of AD, and make them useful for monitoring regenerative medicine approaches for AD

    CNV-ClinViewer: Enhancing the clinical interpretation of large copy-number variants online

    Get PDF
    Purpose Large copy number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. Methods We developed a web-application combining >250,000 patient and population CNVs together with a large set of biomedical annotations and provide tools for CNV classification based on ACMG/ClinGen guidelines and gene-set enrichment analyses. Results Here, we introduce the CNV-ClinViewer (https://cnv-ClinViewer.broadinstitute.org), an open-source web-application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface. Conclusion Overall, this resource facilitates semi-automated clinical CNV interpretation and genomic loci exploration and, in combination with clinical judgment, enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators patient care and for basic scientists translational genomic research

    Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals

    Get PDF
    Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12- q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice

    Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals

    Get PDF
    Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice

    Robust determinants of thermostability highlighted by a codon frequency index capable of discriminating thermophilic from mesophilic genomes.

    No full text
    Abstract: Can genome analysis tell us about the lifestyle of an organism? We ask this question considering a thorough cross comparison of thermophilic and mesophilic genomes, since presently the number of available genomes is enough to ensure statistical significance of the results. We analyze, by means of principal component analysis (PCA), the codon composition of a database comprising 116 genomes, selected so as to include one species for each genus and show that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level. The results of our analysis indicate that all the known features of thermostability can be found in the 64 component loadings of the second principal axis of PCA. By this, we develop an index of thermostability whose discriminative power between mesophiles and thermophiles scores with 98% accuracy at the genome level and with 95% accuracy at the protein sequence level. We also prove that these results are not due to phylogenetic differences between archaea and bacteria
    • …
    corecore