697 research outputs found

    Structural conservation versus functional divergence of maternally expressed microRNAs in the Dlk1/Gtl2 imprinting region

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs play an important functional role in post-transcriptional gene regulation. One of the largest known microRNA clusters is located within the imprinted <it>Dlk1/Gtl2 </it>region on human chromosome 14 and mouse chromosome 12. This cluster contains more than 40 microRNA genes that are expressed only from the maternal chromosome in mouse.</p> <p>Results</p> <p>To shed light on the function of these microRNAs and possible crosstalk between microRNA-based gene regulation and genomic imprinting, we performed extensive <it>in silico </it>analyses of the microRNAs in this imprinted region and their predicted target genes.</p> <p>Bioinformatic analysis reveals that these microRNAs are highly conserved in both human and mouse. Whereas the microRNA precursors at this locus mostly belong to large sequence families, the mature microRNAs sequences are highly divergent.</p> <p>We developed a target gene prediction approach that combines three widely used prediction methods and achieved a sufficiently high prediction accuracy. Target gene sets predicted for individual microRNAs derived from the imprinted region show little overlap and do not differ significantly in their properties from target genes predicted for a group of randomly selected microRNAs. The target genes are enriched with long and GC-rich 3' UTR sequences and are preferentially annotated to development, regulation processes and cell communication. Furthermore, among all analyzed human and mouse genes, the predicted target genes are characterized by consistently higher expression levels in all tissues considered.</p> <p>Conclusion</p> <p>Our results suggest a complex evolutionary history for microRNA genes in this imprinted region, including an amplification of microRNA precursors in a mammalian ancestor, and a rapid subsequent divergence of the mature sequences. This produced a broad spectrum of target genes. Further, our analyses did not uncover a functional relation between imprinted gene regulation of this microRNA-encoding region, expression patterns or functions of predicted target genes. Specifically, our results indicate that these microRNAs do not regulate a particular set of genes. We conclude that these imprinted microRNAs do not regulate a particular set of genes. Rather, they seem to stabilize expression of a variety of genes, thereby being an integral part of the genome-wide microRNA gene regulatory network.</p

    Improved base calling for the Illumina Genome Analyzer using machine learning strategies

    Get PDF
    Ibis is an accurate, fast and easy-to-use base caller for the Illumina Genome Analyzer that reduces error rates and increases output of usable reads

    Understanding and improving high-throughput sequencing data production and analysis

    Get PDF
    Advances in DNA sequencing revolutionized the field of genomics over the last 5 years. New sequencing instruments make it possible to rapidly generate large amounts of sequence data at substantially lower cost. These high-throughput sequencing technologies (e.g. Roche 454 FLX, Life Technology SOLiD, Dover Polonator, Helicos HeliScope and Illumina Genome Analyzer) make whole genome sequencing and resequencing, transcript sequencing as well as quantification of gene expression, DNA-protein interactions and DNA methylation feasible at an unanticipated scale. In the field of evolutionary genomics, high-throughput sequencing permitted studies of whole genomes from ancient specimens of different hominin groups. Further, it allowed large-scale population genetics studies of present-day humans as well as different types of sequence-based comparative genomics studies in primates. Such comparisons of humans with closely related apes and hominins are important not only to better understand human origins and the biological background of what sets humans apart from other organisms, but also for understanding the molecular basis for diseases and disorders, particularly those that affect uniquely human traits, such as speech disorders, autism or schizophrenia. However, while the cost and time required to create comparative data sets have been greatly reduced, the error profiles and limitations of the new platforms differ significantly from those of previous approaches. This requires a specific experimental design in order to circumvent these issues, or to handle them during data analysis. During the course of my PhD, I analyzed and improved current protocols and algorithms for next generation sequencing data, taking into account the specific characteristics of these new sequencing technologies. The presented approaches and algorithms were applied in different projects and are widely used within the department of Evolutionary Genetics at the Max Planck Institute of Evolutionary Anthropology. In this thesis, I will present selected analyses from the whole genome shotgun sequencing of two ancient hominins and the quantification of gene expression from short-sequence tags in five tissues from three primates

    The impact of different negative training data on regulatory sequence predictions

    Get PDF
    Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization

    CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

    Get PDF
    Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction

    Data Privacy in European Medical Research

    Get PDF
    The European Data Protection Regulation applies since May 25th, 2018. It creates a uniform data protection legal framework within the EU. National and international medical research projects, regardless of whether they were started before or after the introduction of the GDPR, are obliged to follow this new regulation and implement it promptly. This raises various challenges for a large number of medical research projects. The University Medicine Greifswald commissioned this legal report, that was prepared by DIERKS+COMPANY. Two real-world research projects, the Baltic Fracture Competence Centre (BFCC) as well as the German Centre for Cardiovascular Research (DZHK) provide use cases, questions, and context for this legal report. It addresses questions regarding all steps of data processing. The report provides practical answers to a wide array of technical and organisational questions in the area of data protection-compliant processing of research data. A comprehensive guide to GDPR-compliant data processing has been developed, which both summarises the broad legal environment and provides specific assistance in the design and implementation of GDPR-compliant data management processes, including Informed Consent, Legal Consequences of Withdrawal, and Privacy by Design

    Challenges to QT Interval Variability Analysis in Mobile Applications

    Get PDF
    The QT interval in an electrocardiogram (ECG) reflects complex processes affecting the repolarization of ventricular myocardium. Increased QT interval variability (QTV) is thought to be caused by ventricular repolarization lability and has been associated with cardiac mortality. Recent publications have shown that template-based methods are more robust than traditional methods for QT interval extraction on a beat-to-beat basis. However, most studies are limited to non-movement ECG recordings, we want to analyze in this study the power of QT interval extraction for mobile non-stationary ECG recordings. The records of 7 test subjects are at least 65 min long and contain about 25 minutes of sport exercise such as running, cycling, sport climbing or acrobatic training. 2DSW was used to extract QT interval and best-fit distance of matched template for signal quality evaluation for each beat. Potential relations between QTV, motion and signal quality are segmentally compared. To determine motion activity we calculated normalized signal magnitude area (SMA). QTV was increased in patients during sport exercise, possibly reflects sympathetic activity in these specific physiological conditions. However, increased QTV could also be caused by low signal quality

    Data Privacy in European Medical Research

    Get PDF
    The European Data Protection Regulation applies since May 25th, 2018. It creates a uniform data protection legal framework within the EU. National and international medical research projects, regardless of whether they were started before or after the introduction of the GDPR, are obliged to follow this new regulation and implement it promptly. This raises various challenges for a large number of medical research projects. The University Medicine Greifswald commissioned this legal report, that was prepared by DIERKS+COMPANY. Two real-world research projects, the Baltic Fracture Competence Centre (BFCC) as well as the German Centre for Cardiovascular Research (DZHK) provide use cases, questions, and context for this legal report. It addresses questions regarding all steps of data processing. The report provides practical answers to a wide array of technical and organisational questions in the area of data protection-compliant processing of research data. A comprehensive guide to GDPR-compliant data processing has been developed, which both summarises the broad legal environment and provides specific assistance in the design and implementation of GDPR-compliant data management processes, including Informed Consent, Legal Consequences of Withdrawal, and Privacy by Design

    Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA

    Get PDF
    DNA sequences determined from ancient organisms have high error rates, primarily due to uracil bases created by cytosine deamination. We use synthetic oligonucleotides, as well as DNA extracted from mammoth and Neandertal remains, to show that treatment with uracil–DNA–glycosylase and endonuclease VIII removes uracil residues from ancient DNA and repairs most of the resulting abasic sites, leaving undamaged parts of the DNA fragments intact. Neandertal DNA sequences determined with this protocol have greatly increased accuracy. In addition, our results demonstrate that Neandertal DNA retains in vivo patterns of CpG methylation, potentially allowing future studies of gene inactivation and imprinting in ancient organisms
    corecore