19,348 research outputs found

    A targeted gene panel that covers coding, non-coding and short tandem repeat regions improves the diagnosis of patients with neurodegenerative diseases

    Get PDF
    Genetic testing for neurodegenerative diseases (NDs) is highly challenging because of genetic heterogeneity and overlapping manifestations. Targeted-gene panels (TGPs), coupled with next-generation sequencing (NGS), can facilitate the profiling of a large repertoire of ND-related genes. Due to the technical limitations inherent in NGS and TGPs, short tandem repeat (STR) variations are often ignored. However, STR expansions are known to cause such NDs as Huntington\u27s disease and spinocerebellar ataxias type 3 (SCA3). Here, we studied the clinical utility of a custom-made TGP that targets 199 NDs and 311 ND-associated genes on 118 undiagnosed patients. At least one known or likely pathogenic variation was found in 54 patients; 27 patients demonstrated clinical profiles that matched the variants; and 16 patients whose original diagnosis were refined. A high concordance of variant calling were observed when comparing the results from TGP and whole-exome sequencing of four patients. Our in-house STR detection algorithm has reached a specificity of 0.88 and a sensitivity of 0.82 in our SCA3 cohort. This study also uncovered a trove of novel and recurrent variants that may enrich the repertoire of ND-related genetic markers. We propose that a combined comprehensive TGPs-bioinformatics pipeline can improve the clinical diagnosis of NDs

    Identifying Structural Variation in Haploid Microbial Genomes from Short-Read Resequencing Data Using Breseq

    Get PDF
    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. Results: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for similar to 25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Conclusions: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.U.S. National Institutes of Health R00-GM087550U.S. National Science Foundation (NSF) DEB-0515729NSF BEACON Center for the Study of Evolution in Action DBI-0939454Cancer Prevention & Research Institute of Texas (CPRIT) RP130124University of Texas at Austin startup fundsUniversity of Texas at AustinCPRIT Cancer Research TraineeshipMolecular Bioscience

    WebSTR: a population-wide database of short tandem repeat variation in humans.

    Get PDF
    Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at http://webstr.ucsd.edu

    Genetic Polymorphisms

    Get PDF
    It is amazing to know that around 99.9% of the individuals genome among persons is alike, and only 0.1% of it differs in chromosome. This variance is accountable for the diversity in phenotypes and receptiveness of them to environmental effects. DNA variants are happening in numerous formulas. Mutations might be definite as order variants which happen in less than 1% of the populace, whereas the extra prevalent variant is identified as polymorphisms. More than 1% of the greatest public hereditary variants are known as single nucleotide polymorphisms (SNPs). In human genome, SNPs considered as plentiful figure of genetic variation, and their importance in contribution to many disease, drug efficacy, and side effects in addition to may represent a prophylaxis. SNPs represent a specific location at which more than one nucleotide is established and only two alleles at a SNP locus. More than 100 million SNPs have been recognized in human, in average each 300 nucleotide on usual. The gene which has more than one allele is a normal result of SNP. SNPs are not restricted to coding sequence, but may be associated with non-coding region. Many techniques are used to analyze SNPs and involve two phases, one for allele recognition and another for detection

    Mechanisms to Evade the Phagocyte Respiratory Burst Arose by Convergent Evolution in Typhoidal Salmonella Serovars.

    Get PDF
    Typhoid fever caused by Salmonella enterica serovar (S.) Typhi differs in its clinical presentation from gastroenteritis caused by S. Typhimurium and other non-typhoidal Salmonella serovars. The different clinical presentations are attributed in part to the virulence-associated capsular polysaccharide (Vi antigen) of S. Typhi, which prevents phagocytes from triggering a respiratory burst by preventing antibody-mediated complement activation. Paradoxically, the Vi antigen is absent from S. Paratyphi A, which causes a disease that is indistinguishable from typhoid fever. Here, we show that evasion of the phagocyte respiratory burst by S. Paratyphi A required very long O antigen chains containing the O2 antigen to inhibit antibody binding. We conclude that the ability to avoid the phagocyte respiratory burst is a property distinguishing typhoidal from non-typhoidal Salmonella serovars that was acquired by S. Typhi and S. Paratyphi A independently through convergent evolution

    Evaluation of two MiniSTR loci mutation events in five Father-Mother-Child trios of Yoruba origin

    Get PDF
    The robustness of short tandem repeats for use in forensic and paternity depends on their high polymorphism and mutation rate. This study tried to determine the event of mutation of two miniSTR loci in the Yoruba population. Blood samples were collected from five father-motherchild trios of Yoruba origin. Two DNA extraction methods, an homemade method and Zymogen gDNA kit were tested for yield and purity for use in the STR assay. The DNA were amplified and resolved on 4% Agarose gel. The first DNA extraction method yielded an average DNA concentration of 1399 ng/μl and while the Kit yielded 984.1 ng/μl; absorbence quotient at 260/280 of 1.78 and 1.55 respectively. Locus D1GATA113 was detected in the father and mother of two families; A and C. D5S2500 was detected only in the male parent (father) in family D. DNA extracted using any of the two methods in this study is appriopriate for use in STR mutation assay but the PCR condition for mutation miniSTR loci among the yoruba still requires extensive optimization.Keywords: DNA extraction Methods, miniSTRs, mutation, Yorub

    Signatures of TOP1 transcription-associated mutagenesis in cancer and germline

    Get PDF
    The mutational landscape is shaped by many processes. Genic regions are vulnerable to mutation but are preferentially protected by transcription-coupled repair1. In microorganisms, transcription has been demonstrated to be mutagenic2,3; however, the impact of transcription-associated mutagenesis remains to be established in higher eukaryotes4. Here we show that ID4—a cancer insertion–deletion (indel) mutation signature of unknown aetiology5 characterized by short (2 to 5 base pair) deletions —is due to a transcription-associated mutagenesis process. We demonstrate that defective ribonucleotide excision repair in mammals is associated with the ID4 signature, with mutations occurring at a TNT sequence motif, implicating topoisomerase 1 (TOP1) activity at sites of genome-embedded ribonucleotides as a mechanistic basis. Such TOP1-mediated deletions occur somatically in cancer, and the ID-TOP1 signature is also found in physiological settings, contributing to genic de novo indel mutations in the germline. Thus, although topoisomerases protect against genome instability by relieving topological stress6, their activity may also be an important source of mutations in the human genome

    Principles of Genetic Fingerprinting in Forensic Medicine

    Get PDF
    موضوع هذا البحث هو وتحليل الحمض النووي العدلي. البيولوجيا الجنائية هي  التحليلات التي تجرى في أقسام علوم الحياة  في مختبرات الطب العدلي. الغرض من هذا البحث هو تقديم مراجعة سريعة لتصنيف الحمض النووي  وتحليل الحمض النووي العدلي. يتم استخدام التسلسلات النوكليوتيدية لمناطق معينة من الحمض النووي البشري الفريدة لكل شخص في الإجراء المخبري المعروف باسم بصمة الحمض النووي للتأكد من هوية الشخص المحتملة. يمكن ان تستخدم  اختبارات الأبوة وتطبيقات الطب العدلي الأخرى وكذلك التحقيقات الجنائية من بصمات الحمض النووي. في هذه الحالات، يكون الهدف هو "مطابقة" بصمتين من الحمض النووي، مثل عينة DNA من شخص معروف وواحدة من شخص مجهول، مع بعضهما البعض. نظرًا لأنه يمكن العثور على المادة الوراثية في كل خلية بشرية، فمن السهل جمع أدلة الحمض النووي. وبالتالي قد يتم التعرف عليها وربطها بالمواقع التي رفعت مها حيث يترك كل شخص اثر بايولوجي عند ملامسته الاشياء الحية و غير الحية وهذا يمكن الاستفادة منه في علم الطب العدلي. يمكن الآن استخراج كمية الحمض النووي اللازمة للتحليل من أصغر عينة بيولوجية، مما يمكّن السلطات من مطابقة المشتبه بهم بالأدلة التي تم العثور عليها في مسرح الجريمة وابعاد الاتهام عن الشخص البريء وتحديد المجرم الحقيقي.The examination of forensic DNA is the focus of this study. The analysis done in forensic ӏabs' bioӏogy section is known as forensic biology. This essay aims to provide a brief overview of forensic DNA anaӏysis and DNA categorization. The ӏaboratory process known as DNA fingerprinting uses the nucӏeotide sequences of certain areas of human DNA that are unique to each individualӏ to determine a person's potentiaӏ identification. Paternity testing, other forensic appӏications, and forensic DNA fingerprinting investigations may aӏӏ be used. The objective in these situations is to "match" two DNA fingerprints, such as a DNA sampӏe from a known individuaӏ and one from an unknown individuaӏ. DNA evidence is simpӏe to get since every human ceӏӏ has genetic materiaӏ. Every individuaӏ ӏeaves a bioӏogicaӏ traiӏ when they come into touch with ӏiving and non-ӏiving objects, making it possibӏe to identify and reӏate it to the ӏocations where Maha was born and grown. This information may then be empӏoyed in forensic science. With the abiӏity to extract enough DNA from even the tiniest bioӏogicaӏ sampӏe, poӏice may now match suspects to evidence coӏӏected at crime scenes, defend the innocent, and catch the genuine offender

    Genetic Inconsistency in Paternity Investigation

    Get PDF
    DNA fingerprint is one of forensic identification method and has high accuracy. However, genetic inconsistency such as STR mutation in paternity testing may give complexity to the analysis and resolution of the investigation of the case. This study was aimed to analyze the presence of genetic inconsistency or mutation of DNA marker in paternity test cases which came to Department of Forensic Medicine, Dr. Sardjito Hospital/Faculty of Medicine, Public Health and Nursing Universitas Gadjah Mada, Indonesia. Totally 58 cases were analyzed from DNA testing cases from 2008 to 2016. Dried bloodstain samples were collected on FTA card after informed-consent and DNA extraction was done directly from FTA card. Amplification was done using commercially available kits and genotyping using ABI Prism 3500 for minimum 15 loci of STR, which are D8S1179, D21S11 CSF1PO, D7S820, D13S317, THO1, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA. As results, there was a single mutation of STR repeats at FGA and D12S391 loci. The mutation at D12S391 locusis a loss of a single repeat of paternal allele. However, at FGA locus is unknown either loss or gain neither repeats nor occurred in paternal or maternal allele. In conclusion, a single repeat mutation was observed at FGA and D12S391 loci. Keywords: maternal, mutation, paternal allele, paternity test, ST

    Computational analysis of human genomic variants and lncRNAs from sequence data

    Get PDF
    The high-throughput sequencing technologies have been developed and applied to the human genome studies for nearly 20 years. These technologies have provided numerous research applications and have significantly expanded our knowledge about the human genome. In this thesis, computational methods that utilize sequence data to study human genomic variants and transcripts were evaluated and developed. Indel represents insertion and deletion, which are two types of common genomic variants that are widespread in the human genome. Detecting indels from human genomes is the crucial step for diagnosing indel related genomic disorders and may potentially identify novel indel makers for studying certain diseases. Compared with previous techniques, the high-throughput sequencing technologies, especially the next- generation sequencing (NGS) technology, enable to detect indels accurately and efficiently in wide ranges of genome. In the first part of the thesis, tools with indel calling abilities are evaluated with an assortment of indels and different NGS settings. The results show that the selection of tools and NGS settings impact on indel detection significantly, which provide suggestions for tool selection and future developments. In bioinformatics analysis, an indel’s position can be marked inconsistently on the reference genome, which may result in an indel having different but equivalent representations and cause troubles for downstream. This problem is related to the complex sequence context of the indels, for example, short tandem repeats (STRs), where the same short stretch of nucleotides is amplified. In the second part of the thesis, a novel computational tool VarSCAT was described, which has various functions for annotating the sequence context of variants, including ambiguous positions, STRs, and other sequence context features. Analysis of several high- confidence human variant sets with VarSCAT reveals that a large number of genomic variants, especially indels, have sequence features associated with STRs. In the human genome, not all genes and their transcripts are translated into proteins. Long non-coding ribonucleic acid (lncRNA) is a typical example. Sequence recognition built with machine learning models have improved significantly in recent years. In the last part of the thesis, several machine learning-based lncRNA prediction tools were evaluated on their predictions for coding potentiality of transcripts. The results suggest that tools based on deep learning identify lncRNAs best. Ihmisen genomivarianttien ja lncRNA:iden laskennallinen analyysi sekvenssiaineistosta Korkean suorituskyvyn sekvensointiteknologioita on kehitetty ja sovellettu ihmisen genomitutkimuksiin lähes 20 vuoden ajan. Nämä teknologiat ovat mahdollistaneet ihmisen genomin laaja-alaisen tutkimisen ja lisänneet merkittävästi tietoamme siitä. Tässä väitöstyössä arvioitiin ja kehitettiin sekvenssiaineistoa hyödyntäviä laskennallisia menetelmiä ihmisen genomivarianttien sekä transkriptien tutkimiseen. Indeli on yhteisnimitys lisäys- eli insertio-varianteille ja häviämä- eli deleetio-varianteille, joita esiintyy koko genomin alueella. Indelien tunnistaminen on ratkaisevaa geneettisten poikkeavuuksien diagnosoinnissa ja eri sairauksiin liittyvien uusien indeli-markkereiden löytämisessä. Aiempiin teknologioihin verrattuna korkean suorituskyvyn sekvensointiteknologiat, erityisesti seuraavan sukupolven sekvensointi (NGS) mahdollistavat indelien havaitsemisen tarkemmin ja tehokkaammin laajemmilta genomialueilta. Väitöstyön ensimmäisessä osassa indelien kutsumiseen tarkoitettuja laskentatyökaluja arvioitiin käyttäen laajaa valikoimaa indeleitä ja erilaisia NGS-asetuksia. Tulokset osoittivat, että työkalujen valinta ja NGS-asetukset vaikuttivat indelien tunnistukseen merkittävästi ja siten ne voivat ohjata työkalujen valinnassa ja kehitystyössä. Bioinformatiivisessa analyysissä saman indelin sijainti voidaan merkitä eri kohtiin referenssigenomia, joka voi aiheuttaa ongelmia loppupään analyysiin, kuten indeli-kutsujen arviointiin. Tämä ongelma liittyy sekvenssikontekstiin, koska variantit voivat sijoittua lyhyille perättäisille tandem-toistojaksoille (STR), jossa sama lyhyt nukleotidijakso on monistunut. Väitöstyön toisessa osassa kehitettiin laskentatyökalu VarSCAT, jossa on eri toimintoja, mm. monitulkintaisten sijaintitietojen, vierekkäisten alueiden ja STR-alueiden tarkasteluun. Luotettaviksi arvioitujen ihmisen varianttiaineistojen analyysi VarSCAT-työkalulla paljasti, että monien geneettisten varianttien ja erityisesti indelien ominaisuudet liittyvät STR-alueisiin. Kaikkia ihmisen geenejä ja niiden geenituotteita, kuten esimerkiksi ei-koodaavia RNA:ta (lncRNA) ei käännetä proteiiniksi. Koneoppimismenetelmissä ja sekvenssitunnistuksessa on tapahtunut huomattavaa parannusta viime vuosina. Väitöstyön viimeisessä osassa arvioitiin useiden koneoppimiseen perustuvien lncRNA-ennustustyökalujen ennusteita. Tulokset viittaavat siihen, että syväoppimiseen perustuvat työkalut tunnistavat lncRNA:t parhaiten
    corecore