8 research outputs found

    FeatureMap3D—a tool to map protein features and sequence conservation onto homologous structures in the PDB

    Get PDF
    FeatureMap3D is a web-based tool that maps protein features onto 3D structures. The user provides sequences annotated with any feature of interest, such as post-translational modifications, protease cleavage sites or exonic structure and FeatureMap3D will then search the Protein Data Bank (PDB) for structures of homologous proteins. The results are displayed both as an annotated sequence alignment, where the user-provided annotations as well as the sequence conservation between the query and the target sequence are displayed, and also as a publication-quality image of the 3D protein structure with the selected features and sequence conservation enhanced. The results are also returned in a readily parsable text format as well as a PyMol () script file, which allows the user to easily modify the protein structure image to suit a specific purpose. FeatureMap3D can also be used without sequence annotation, to evaluate the quality of the alignment of the input sequences to the most homologous structures in the PDB, through the sequence conservation colored 3D structure visualization tool. FeatureMap3D is available at:

    Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA

    No full text
    Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.</p

    HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph.

    No full text
    Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment

    euka:Robust tetrapodic and arthropodic taxa detection from modern and ancient environmental DNA using pangenomic reference graphs

    No full text
    1. Ancient environmental DNA (aeDNA) is a crucial source of information for past environmental reconstruction. However, the computational analysis of aeDNA involves the inherited challenges of ancient DNA (aDNA) and the typical difficulties of eDNA samples, such as taxonomic identification and abundance estimation of identified taxonomic groups. Current methods for aeDNA fall into those that only perform mapping followed by taxonomic identification and those that purport to do abundance estimation. The former leaves abundance estimates to users, while methods for the latter are not designed for large metagenomic datasets and are often imprecise and challenging to use. 2. Here, we introduce euka, a tool designed for rapid and accurate characterisation of aeDNA samples. We use a taxonomy-based pangenome graph of reference genomes for robustly assigning DNA sequences and use a maximum-likelihood framework for abundance estimation. At the present time, our database is restricted to mitochondrial genomes of tetrapods and arthropods but can be expanded in future versions. 3. We find euka to outperform current taxonomic profiling tools and their abundance estimates. Crucially, we show that regardless of the filtering threshold set by existing methods, euka demonstrates higher accuracy. Furthermore, our approach is robust to sparse data, which is idiosyncratic of aeDNA, detecting a taxon with an average of 50 reads aligning. We also show that euka is consistent with competing tools on empirical samples. 4. euka's features are fine-tuned to deal with the challenges of aeDNA, making it a simple-to-use, all-in-one tool. It is available on GitHub: https://github.com/grenaud/vgan. euka enables researchers to quickly assess and characterise their sample, thus allowing it to be used as a routine screening tool for aeDNA.</p

    RosettaDDGPrediction for high-throughput mutational scans:From stability to binding

    No full text
    Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein–protein interaction. Advances in experimental mutational scans allow high‐throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease‐related variants that can benefit from analyses with structure‐based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high‐throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high‐throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication‐ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction

    Nationwide germline whole genome sequencing of 198 consecutive pediatric cancer patients reveals a high frequency of cancer prone syndromes

    No full text
    PurposeHistorically, cancer predisposition syndromes (CPSs) were rarely established for children with cancer. This nationwide, population-based study investigated how frequently children with cancer had or were likely to have a CPS.MethodsChildren (0-17 years) in Denmark with newly diagnosed cancer were invited to participate in whole-genome sequencing of germline DNA. Suspicion of CPS was assessed according to Jongmans'/McGill Interactive Pediatric OncoGenetic Guidelines (MIPOGG) criteria and familial cancer diagnoses were verified using population-based registries.Results198 of 235 (84.3%) eligible patients participated, of whom 94/198 (47.5%) carried pathogenic variants (PVs) in a CPS gene or had clinical features indicating CPS. Twenty-nine of 198 (14.6%) patients harbored a CPS, of whom 21/198 (10.6%) harbored a childhood-onset and 9/198 (4.5%) an adult-onset CPS. In addition, 23/198 (11.6%) patients carried a PV associated with biallelic CPS. Seven of the 54 (12.9%) patients carried two or more variants in different CPS genes. Seventy of 198 (35.4%) patients fulfilled the Jongmans' and/or MIPOGG criteria indicating an underlying CPS, including two of the 9 (22.2%) patients with an adult-onset CPS versus 18 of the 21 (85.7%) patients with a childhood-onset CPS (p = 0.0022), eight of the additional 23 (34.8%) patients with a heterozygous PV associated with biallelic CPS, and 42 patients without PVs. Children with a central nervous system (CNS) tumor had family members with CNS tumors more frequently than patients with other cancers (11/44, p = 0.04), but 42 of 44 (95.5%) cases did not have a PV in a CPS gene.ConclusionThese results demonstrate the value of systematically screening pediatric cancer patients for CPSs and indicate that a higher proportion of childhood cancers may be linked to predisposing germline variants than previously supposed
    corecore