12 research outputs found

    SVenX: A highly parallelized pipeline for structural variation detection using linked read whole genome sequencing data

    Get PDF
    Genomic rearrangements larger than 50 bp are called structural variants. As a group, they affect the phenotypic diversity among humans and have been associated with many human disorders including neurodevelopmental disorder and cancer. Recent advances in whole genome sequencing (WGS) technologies have made it possible to identify many more disease-causing genetic variants relevant in clinical diagnostics and sometimes affecting treatment. Numerous approaches have been proposed to detect structural variants, but to acquire and filter out the most significant information from the multitude of called variants in the sequencing data has shown to be a challenge. Another obstacle is the high computational cost of data analyses and difficulties in configuring and operating the softwares and databases. Here, we present SVenX, a highly automated and parallelized pipeline that analyzes and call structural variants using linked read WGS data. It performs variant calling using three different approaches, as well as annotation of variants and variant filtering. We also introduce a new tool, SVGenT, that reanalyzes the called structural variants by performing de novo assembly using the aligned reads at the identified breakpoint junctions. By comparing assembled contigs and analyzing the read coverage between the breakpoint junctions, SVGenT improves both variant and genotype classification and the breakpoint localization.Tool for detection of genomic rearrangements in humans Genomic rearrangements larger than 50 base pairs are referred to as structural variants (SVs), and impact phenotypic differences between humans. Some of these variants have been associated with human diseases such as cancer and neurodevelopmental disorders. Recent advances in whole genome sequencing (WGS) technologies have made it possible to analyze and identify many structural variants. Yet, the existing tools used for analyzing these data are not perfect, and require a fair amount of knowledge in bioinformatics to operate. SVenX is a highly parallelized and automated pipeline, executing all steps from whole genome sequencing data to filtered SVs. This includes 1) verifying that all required data exist, 2) making sure no data duplications exist, 3) finding variants using different methods, and 4) annotating and filtering the detected SVs. SVenX performs 10 separate steps including 3 different variant detection tools (also known as variant callers). Normally, these steps are performed one by one, waiting for the output before running the next. Not only does it take longer for the programs to run with this approach, it also requires an employee to execute the steps. Except from the installation, SVenX takes at the most a few minutes to setup and launch and can analyze multiple samples of WGS data at the same time. The whole pipeline takes about 4 to 5 days to complete, requiring minimal work effort and bioinformatic knowledge. Another challenge in SV research is not only detecting the variants, but also to be confident that the detected SVs are true calls. The performance of existing variant callers differ significantly between each other. One tool can perform really good using one dataset and fail totally in detecting SVs in another dataset, while a second tool might be good in detecting only a single type of SV. Using multiple bioinformatics methods to detect SVs have shown to result in a higher detection rate. We have created a novel tool, SVGenT, that re-analyzes already detected SVs by doing de novo assembly. SVGenT classifies the SV type (deletion, duplication, inversion or break-end), genotype (homozygous or heterozygous), and update the genomic position of the SV breakpoints. SVGenT has been tested using two datasets: one public large-scale WGS dataset and one simulated dataset with 4000 SVs. Three different variant callers were used to detect the variants before SVGenT was run on the output files. The detection rate was calculated before and after SVGenT was applied. In most cases, SVGenT improved the classification of both SV-type and SV-genotype. Master’s Degree Project in Biology/Molecular Biology/Bioinformatics 60 credits 2017 Department of Biology, Lund University Advisor: Anna Lindstrand M.D., Ph.D. Karolinska Institutet

    Identification of chromosomal rearrangements in colorectal cancer

    Get PDF
    Curs 2014-2015Cancer research is continuously shedding light into these worldwide leading diseases. It is mandatory to have higher knowledge in cancer biology to consequently find out new candidate biomarkers and therapeutics. Among all of them, Colorectal cancer is the most commonly seen of human malignant cancers and has the third highest mortality rate[1]. Since the release of the first human genome sequence in 2004, new techniques have revolutionised the study of genetics and its possible applications. A broad type of studies has been carried out; being Single Nucleotide Polymorphisms and Copy Number Variants the most intensively studied analysis. However, other kinds of mutations involving larger parts of the genome, the so-called structural variants, have been substantially less analyzed due to technical limitations. High-throughput sequencing methods seem to have lowered these restrictions. In this study, gene fusions have been searched in whole exome sequencing samples taking 42 paired normal and cancer tissues. Beginning with short-read files obtained with the mentioned method, they have been aligned against a reference genome to later be analyzed with Breakdancer, a structural variant calling algorithm. After some filtering criteria performed in order to remove a high proportion of false positives, a highly probable list of 22 balanced structural variants (translocations and/or inversions) has been manually studied to get a final result of 20 chromosomal rearrangements, 8 of which are considered gene fusions. In addition, it has been found that one recurrent translocation seen in recent studies is indeed a false positive. Further studies taking into account these results may contribute to the findings of new biomarkers for certain subtypes of colorectal cancer.Director/a: Victor Moreno, co-director: Mireia Olivell

    Human and Mycobacterium tuberculosis

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 자연과학대학 협동과정 생물정보학전공, 2021.8. 성주헌.DNA 시퀀싱 기술은 현대 생물학의 중추적인 부분이다. 비용 효율성을 달성하기 위해 대부분의 시퀀싱 플랫폼에서는 참조 게놈에 기반한 리시퀀싱 접근 방식을 사용한다. 참조 게놈은 차세대 시퀀싱(NGS)에서 짧은 리드들을 매핑하고 변이들을 발견하는데 중요한 역할을 하기 때문에 여러 종들에서 참조 게놈들이 존재하고 있다. 예를 들어, 인간에서 GRCh(Genome Reference Consortium의 인간 참조 게놈)는 인간 게놈 프로젝트 이후부터 참조 게놈으로 사용되어져 왔고, 또한 결핵에서는 가장 많이 연구된 변종인 H37Rv이 참조 게놈으로 사용되어 왔다. 이전에는 개인의 유전적 변이들을 결정하는 데 하나의 참조 게놈만이 필요할 것으로 생각되었다. 그러나 참조 게놈이 특정 종의 모든 개인을 대표하는 것인지에 대해서는 여전히 회의적인 시각들이 있다. 많은 연구자들이 다른 인종 또는 혈통 집단들 간의 유전체간의 구조적 변화의 다양성을 지적하면서, 참조 게놈에는 없지만 적어도 소수의 개인들 또는 혈통들에 존재하는 새로운 유전체 서열들을 보고했다. 실제로, 시퀀싱 과정에서 "매핑되지 않은 리드"들이나 잘못된 변이 호출 등을 통해 누락되거나 제한된 정보들이 발생할수 있다. 따라서, 이 연구는 인간 및 미코박테리아 결핵균에서 기준 게놈의 누락된 유전체 영역을 확인하고 그 격차를 해소하는 시도를 하였다. 인간 유전체에서 이 연구는 아프리카 조상을 포함한 50명 이상의 개인 게놈으로 구성된 인간기준 게놈(GRCh38)에서 빠진 부분을 보완하기 위해, 고도로 연속된 게놈 조립체인 AK1을 사용했다. GRCh38에서 누락된 지역을 찾기 위해 기준 게놈(GRCh38)을 AK1과 직접 비교하는 방법과 14명의 전장 유전체 데이터(동아시아 5명, 유럽 4명, 아프리카 5명)에서 "매핑되지 않은 리드”들을 다시 AK1에 붙여보는 방법을 사용하였다. 먼저, GRCh38과 AK1 간의 직접 비교는 두 시퀀스에서 간격을 허용하는 쌍방향 정렬을 설명하는 체인 파일을 사용하였고, 매핑되지 않은 읽기를 사용하는 또 다른 방법은 AK1에 다시 정렬하였는데, 각 방법은 GRCh38에 존재하지 않았던 3,333개의 고유 게놈 영역(사이즈> 200bp)과 38개의 추정 결측 영역(7명 이상의 데이터의 매핑되지 않은 리드들이 붙은 영역)을 각각 발견했다. 또한, 매핑되지 않은 리드들을 사용할 때 여러 인종들의 데이터에서 매핑되지 않은 리드들의 평균 0.90%가 AK1에 새로 정렬되었고, 동아시아 인종의 매핑되지 않은 리드들의 정렬율은 0.95%로 다른 민족에 비해 높다는 것을 확인할수 있었다. 7명이상의 전장 유전체 데이터의 매핑되지 않은 리드들이 정렬된 AK1만의 유전자 서열이자 GRCh38에서는 결측되어 있을것이라 추정되는 영역에 대한 추가 연구를 위해, 본 연구는 BLASTx와 함께 서열을 분석하여 서열의 기능적 역할을 확인해보았고, Repeat Masker를 통해 누락된 것으로 보이는 유전체 영역에 대한 반복서열을 조사하였다. 미코박테리움 결핵균에서는 참조 게놈에서 누락된 부분을 보완하기 위해 다른 방법을 사용하여 이 연구를 수행하였다. 이 연구에서는 결핵균 참조 게놈(H37Rv)의 새로운 범유전자 서열을 구성하였는데, H37Rv에서 대체 서열을 구축하기 위해 176개의 전체 게놈 어셈블리로부터 추출한 시퀀스들(갭 사이즈> 50bp)과 724개의 전장 유전체 데이터에서 추출한 "매핑되지 않은” 리드들을 데노보 어셈블리를 하였다. 그 결과, 454개의 contigs들이 범유전체 시퀀스들로 최종 확정되었다. 본 연구에서는 구성된 범 유전체 시퀀스의 효과를 확인하기 위해 H37Rv만을 사용하는 것과 비교하여 정렬과 변이 호출 결과들을 분석하였다. 결론적으로, 이 연구는 본 연구는 인간 및 미코박테리아 결핵균의 참조 게놈과 염기서열들에 대한 더 많은 이해를 제공한다. 또한, 참조 게놈들에서 누락된 부위에 대한 추가 조사의 필요성을 제기하고, 특히 미코박테리아 결핵균의 유전체 데이터를 실제 사례로 활용하여 참조 게놈에서의 차이를 해소할 수 있는 가능성을 보여주고 있다.DNA sequencing is the pivotal point of mordern biology. To accomplish cost-efficiency, the re-sequencing approaches based on reference genomes are use by the vast majority of sequencing platforms. Because reference genomes play an important role in mapping short reads and detecting several variants on next generation sequencing (NGS), there are reference genomes in several species. For example, in humans, GRCh (human reference genome of the Genome Reference Consortium) has been the reference genome since the Human Genome Project. H37Rv, the most studied strain, has been used as the reference genome in Mycobacterium tuberculosis. It was previously thought that determining individuals’ genetic variants would require only a single global reference genome. However, there are some skepticism whether reference genomes are truly representative of all individuals in a given species. Many researchers have pointed out the diversity of structural variation among different ethnic or lineage groups and reported novel sequences that are not present in the reference genome but are present in at least a few individuals or strains. In the sequencing process, this could bring about missing or limited information through “unmapped reads” or incorrect variant calling so on. This study attempts to bridge the gap and identify missed genomic regions of the reference genome in human and Mycobacterium tuberculosis. In human genome, this study used a highly contiguous ethnic genome assembly (AK1) to complement missing parts in the human reference genome (GRCh38), which consists of genomes from >50 individuals including those with African ancestry. To find the missing regions on GRCh38, this study directly compared the reference genome (GRCh38) with the AK1 and using “unmapped” reads of fourteen individuals’ whole genome sequencing data (5 East Asian, 4 European, and 5 African ancestry). The direct comparison between GRCh38 and AK1 was performed with chain file, which describes a pairwise alignment that allow gaps in both sequences. Another way of using unmapped reads were newly re-aligned to AK1. Each way discovered 3,333 unique genomic regions (size > 200 bp) of AK1 as compared to GRCh38 and 38 estimated missing regions (by ≥ 7 individuals’ unmapped reads) that did not exist in GRCh38. In using unmapped reads, the average 0.90% of the unmapped reads was newly re-aligned to AK1. Furthermore, the alignment rate for East Asian was 0.95%, which was higher than other ethnic groups. For further research on the estimated missing regions, which were defined as unique AK1 genomic sequences aligned by seven or more individuals’ unmapped reads, this study analyzed the sequences with BLASTx to identify the suggested functional roles of the sequences and Repeat Masker to take a look into the repetitive characteristics of the AK1 regions. In Mycobacterium tuberculosis, this study was performed using another method to complement the missing parts in the reference genome. New pan-genome sequences of Mycobacterium tuberculosis’ reference genome (H37Rv) were constructed. To build alternative sequences on H37Rv, this study assembled sequences (gap size > 50 bp) of 176 complete genome assemblies and “unmapped” reads of 724 whole genome sequencing data (de novo assembly). 454 contigs were finalized as pan-genome sequences after quality control. To identify the effects of constructed pan-genome sequences, this study analyzed alignment and variant calling results as compared to using only H37Rv. Finally, this study provides more understanding for reference genome and sequencing. Also, this study raises the need for further investigations on the missing regions of reference genomes in human and Mycobacterium tuberculosis and illuminates the possibility of bridging the gap in the reference with using genome data of Mycobacterium tuberculosis as a practical example.Chapter 1. Introduction . 1 1.1. Overview of sequencing technology 2 1.2. De novo assembly vs. Resequencing 3 1.2.1. De novo assembly 3 1.2.2. Resequencing . 4 1.2.3. Sequencing alignment . 5 1.3. The usage of the reference genome in sequencing data analysis. 7 1.3.1. Reference genome 7 - Human 7 - Mycobacterium tuberculosis . 8 1.3.2. The shortcomings of reference genome 9 1.3.3. The efforts to bridge the gap on reference genomes 10 1.4. Objectives 11 1.5. Outline of the thesis 12 Chapter 2. Finding Missing Regions with Human Reference Genome 13 2.1. Introduction 14 2.2. Materials and Methods 15 2.2.1. Genome assembly data and making chain file between genome assemblies 15 2.2.2. Comparison between the reference genome (GRCh38) and the AK1 genome with chain files. 16 2.2.3. Sample data. 17 2.2.4. The processing of unmapped reads extracted from sample files 17 2.2.5. Visualization 18 2.3. Results. 19 2.3.1. Discovery of missing information with systematic comparison between GRCh38 p.12 and AK1. 19 2.3.2. Profile of the "Unmapped Reads" 19 2.3.3. Discovery of missing information with "unmapped reads" by realignment to AK1 20 2.3.4. Verification of presence on missing regions by comparing with GRCh38 and experimenting PCR 21 2.4. Discussion. 23 Chapter 3. Characterization of the Common Missing Genomic Regions. 40 3.1. Materials and Methods. 41 3.1.1. Sample data. 41 3.1.2. In silico functional search on candidate missing regions - BLAST (Basic Local Alignment Search Tool) search 41 3.1.3. Identifications of transposable elements for studying the characteristics on missing regions by Repeat Masker. 42 3.2. Results 42 3.2.1. Finding estimated functions of missing genomic regions. 42 3.2.2. Characteristics of candidate missing genomic regions on the repetitive sequences. 43 3.2.3. Identifying the occurrence mechanism of insertions related with missing genomic regions. 43 3.3. Discussion. 44 Chapter 4. Construction of a Pan-tuberculosis Reference . 54 4.1. Introduction. 55 4.2. Materials and Methods. 56 4.2.1. Sample data 56 4.2.2. The identification of differences between complete genome data by using chain files. 57 4.2.3. The de novo assembly of unmapped reads from whole genome data. 57 4.2.4. Building pangenome reference by hybrid de novo assembly 57 4.2.5. Identification of effects on alignments and variant call results with alternative sequences. 58 4.3. Results. 59 4.3.1. In silico analysis on candidate genomic gaps of 176 scaffolds based on H37Rv 59 4.3.2. De novo assembly of unmapped reads from whole genome sequencing data of TB . 59 4.3.3. Merging gaps from complete genomes and contigs of unmapped reads using hybrid de novo assembly. 60 4.3.4. The effects on alignment and variant call results with final alternative sequences 61 4.4. Discussion. 62 Chapter 5. Summary and Conclusion 81 5.1. General Discussion 82 5.2. Summary and Conclusions 84 References 87 Supplementary Materials . 96 Abstract in Korean 124박

    Familial Studies in Whole Exome and Genome Sequencing

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Population genetics has been revolutionized by the advent of high-throughput sequencing (HTS) methods in the 21st century. Modern day sequencers are now capable of sequencing entire exomes and genomes at unprecedented speed and accuracy. An explosion of bioinformatics software and data analysis tools now makes sequencing accessible for gene discovery in both rare Mendelian and complex disease. Family-based sequencing studies in particular have great potential for elucidating the genetic basis for many more diseases. We apply both whole exome and genome sequencing to three different cases of familial disease: intracranial aneurysm (IA), Parkinson disease (PD), and X-linked ataxia dementia (XLAD). IA and PD are both common, complex traits that inflict a devastating disease burden worldwide, mostly due to few effective therapeutic interventions. Little of the heritability of both IA and PD has been explained to date, especially as it relates to the impact of rare variation on disease. XLAD is an extremely rare neurological disease described thus far in one kindred. Although promising results have been achieved through previous genetic study designs, the causative gene has not yet been identified. For all three diseases, HTS offers an opportunity to explore the role of rare variation in disease pathogenesis. In each study, we explore the opportunities and challenges of family-based HTS for different disease models. The work presented herein contributes effective practices for study design, analysis, and interpretation in a rapidly growing field still replete with questions about how best to implement HTS in studying familial disease

    Discovering novel human structural variation from diverse populations and disease patients: an exploration of what human genomics misses by relying on reference-based analyses

    Get PDF
    Since the completion of the human genome project, the field of genomics has relied on the human reference genome for nearly all analyses. Population genetics, disease association studies, and beyond all begin by comparing an individual’s sequenced genome to the human reference. However, the human reference genome is not only still incomplete, but also not an accurate representation of humanity; it is derived primarily from a single individual, and cannot possibly represent the scope of human diversity. By using this genome as a template, we bias our studies. In this thesis we examine large regions of structural variation between individuals that are often missed by comparing solely to the human reference genome. We use multiple strategies to uncover variation, including performing localized assembly on whole genome sequencing reads not matching the reference genome from 910 individuals of African ancestry, and utilizing new, long-read sequencing technologies in disease patients. We demonstrate that vast amounts of sequence present in human populations, nearly 300 megabases in the case of the African ancestry dataset, are missing from the reference genome, as well as that many non-reference sequences are present in breast cancer and Mendelian disease patients, which could have yet-to-be-discovered disease relevance. We find evidence of novel non-reference sequences which are genic and transcribed in many individuals, which may have functional relevance. Finally we present strategies for integrating the wealth of short-read sequencing data currently available with the limited but growing number of newer, long-read sequenced samples to gain new insights previously inaccessible using short-read data alone

    Novel insights into the genetic basis of chronic obstructive pulmonary disease

    Get PDF
    Chronic obstructive pulmonary disease (COPD), defined as irreversible airflow limitation, is caused by a complex interaction of environmental exposures, most commonly cigarette smoke, and genetic factors. Genetic studies of COPD have used tests of genome-wide linkage and association to identify loci that contribute to disease susceptibility. However, as seen in other chronic diseases, the best-replicated loci associated with COPD only account for a small portion of disease heritability. Identifying additional genetic determinants of chronic diseases offers the opportunity to better understand their biology as well as the promise of better disease prediction and patient stratification, first steps in the development of precision medicine. The genetic architecture underlying chronic disease is complex, and it is likely that there is still common variation contributing to COPD that has been masked from association studies by phenotypic and genotypic heterogeneity. Further, there is evidence that rare variation contributes to chronic disease susceptibility, and rare variants in SERPINA1 leading to alpha-1 antitrypsin deficiency support this in COPD. The trio of studies presented in this work aim to detect both of these types of variation. In the first, we employ an extreme-trait study design to detect rare variants in the first whole genome sequencing study of COPD. Using this approach, we identify a previously unreported non-synonymous variant associated with COPD, and two suggestively associated candidate genes, PTPRO and ZNF816. In the second and third studies, we integrate mouse and human genetic data to identify undetected common variants associated with human disease and mouse models of disease. The first study uses a mouse model of cigarette smoke-induced emphysema and identifies the gene ABI3BP as a potential candidate gene. The second looks at early life determinants of chronic disease by measuring airspace size in mice at maturity, leading to identification of IL1R2, which plays a previously undescribed role in lung development. Finally, we demonstrate that by integrating the results of genetic studies, it is possible to gather additional information about the genetic architecture of chronic diseases like COPD

    Genome-wide approaches for identifying genes involved in the maintenance of genomic stability

    Get PDF
    The maintenance of genomic stability and the repair of DNA damage are essential for the survival of all cells. Despite diverse pathways for repair of DNA lesions, different mutations can arise, ranging from Single Nucleotide Variants (SNVs) to larger Structu- ral Variants (SVs). The processes that play a role in the formation of these alterations are not fully understood. In this thesis, I present two complementary approaches for accumulating genomic variants and for identifying pathways involved in the suppression of mutation formation using Saccharomyces cerevisiae (budding yeast) gene knockout strains. First, using next-generation sequencing, I studied neutral variants through a mutation accumulation assay for up to 1800 generations. I used 47 yeast strains with known defects in DNA replication, repair and recombination pathways. In all strains, small insertions and deletions (indels) were more common than larger SVs (>50bp). Most mutations occurred in repetitive sequences, implicating replication based mechanisms and homologous recombination in the formation of genomic variants. Furthermore, the knockout of MSH2 produced a hypermutable strain that acquired the highest number of indels. Moreover, the knockout of the genes SWR1 and ISW1, involved in chromatin remodeling, resulted in strains with high number of deletions. These results suggest that defects in establishing a correct chromatin architecture may play a role in the formation of genomic variants. I further performed a genome-wide screen for genes that suppress deletion formation under different drug treatments in the presence or absence of homologous repeats by using designed constructs. As expected, deletions occurred more often between repeats, in support of the frequent involvement of homologous recombination in the formation of chromosome rearrangements. In addition, I identified genes whose knockout led to incre- ased levels of deletions. Among these, IOC4 is of particular interest given that it belongs to the same chromatin remodeling complex as ISW1, identified in the neutral mutati- on accumulation assay. This provides further evidence that chromatin remodeling may be involved in preventing the occurrence of SVs. Furthermore, several meiosis-related mutants also showed increased levels of deletions, suggesting that meiosis proteins may have additional roles in the maintenance of genomic stability during vegetative growth. By performing additional experimental validations, I verified the higher vulnerability of meiosis gene knockouts to acquire deletions, especially in their diploid stages. In the last chapter, I briefly describe the results of several side projects in which I applied computational methods learned through the above mentioned projects, to identify and characterize genomic rearrangements in different human cancers. In summary, I have found that genome-wide approaches can provide interesting insights into the understanding of genomic variants in yeast and human cancers. In particular, given the evolutionary conservation of the ISWI chromatin remodeling complex and meiosis-related genes, the results presented here point to potentially novel functions of these proteins in the maintenance of genomic stability

    Next generation sequencing to identify multiple clinically relevant genetic lesions for the diagnosis of acute myeloid leukaemia

    Get PDF
    Routine cytogenetic and molecular genetic investigations aid the diagnosis of acute myeloid leukaemia (AML) and are critical for prognostic stratification to optimise therapy and enhance survival of patients. Advances in the understanding of the genomics of AML by next generation sequencing (NGS) technology have identified a mutational landscape, which has the potential to improve risk assessment and identify new targets for therapy. This project developed a novel, custom-designed NGS panel for the resequencing of genomic DNA (gDNA), to identify multiple types of genetic lesion in AML. Regions of recurrent genetic abnormalities were targeted, to reproduce the output of conventional testing in a single assay. Solution hybrid capture and Illumina-based NGS were used to analyse 36 AML samples, without use of normal control to represent the typical diagnostic workflow. A panel of 42 genes, including those most frequently mutated, was used to test for clinically relevant abnormalities, including common duplications and gene fusions. Sequence data was analysed with a pipeline of relevant bioinformatic tools and the output was compared to standard results and sequencing from an alternative NGS platform. Following variant annotation, a total of 143 likely oncogenic variants were detected across all samples. This included all 13 NPM1 insertions, 10 FLT3-ITD, and the 7 fusion genes found by routine tests. There was strong concordance between NGS platforms for mutation detection. Multiple new findings included two KMT2A-PTD, a TP53 mutation in a patient with a complex karyotype, and a rare NUP98-DDX10 gene fusion. Patients were regrouped by a new prognostic scheme based on genomic features. Eight patients were reclassified; seven changed from the Intermediate group, three to Favourable and four to Adverse. The successful detection of genomic lesions demonstrated the principle that the new NGS assay could reliably detect a variety of genomic abnormalities and that it could be refined for use in the diagnostic laboratory, with the potential to rationalise multidisciplinary workflows. The feasibility of implementation is discussed. A potential clinical utility was inferred and suggests that benefit could be derived for its validation for mainstream diagnosis for the clinical management of AML

    Étude de la variabilité du génome mitochondrial comme facteur de susceptibilité au cancer du sein

    Get PDF
    A large part of the genetic component of breast cancer risk (BCR) is still unexplained. Therefore I studied if variants of the mitochondrial genome (mtDNA) might explain a part of this risk. In fact, mitochondria is the main source of reactive oxygen species (ROS), which contribute to genomic instability and tumor development. As a first axis of research, I studied potential interactions between some nuclear and mitochondrial variants, in conjugation to alcohol consumption. Despite the large dimensions of our dataset, the lack of statistical significant interaction in our data might reveal that former published results that show such interactions were not robust. I also studied if mitochondrial haplogroups could be considered as modificators of known association between BCR and pathogenic mutations in the BRCA1/2 genes. I identified haplogroup T1a1 such as modificator for individuals carrying a mutated BRCA2. Finally, I characterized by NGS mitochondrial genome of women diagnosed for a familial breast cancer, but tested negative for known pathogenic BRCA1/2 mutations. Several variants were identified as potentially damaging. Two genes, MT-ATP6 and MT-CYB are specifically enriched both in terms of distinct variants and in the number of individuals carrying these variants. They are both essential structural components of the mitochondrial respiratory chain, the main ROS production source in the cell. All these analyses contribute to enrich the knowledge about associations between BCR and variability of mtDNA, by integrating questions linked to interactions between genomic variants, environmental exposure, and effect modifications related to mitochondrial haplogroupsUne large part de la composante génétique du risque de cancer du sein est encore inexpliquée. J'ai ainsi étudié dans quelle mesure les variants observés sur le génome mitochondrial pourraient en partie expliquer ce risque. En effet la mitochondrie, en tant que source d'énergie cellulaire, est un organite impliqué dans la synthèse des espèces oxygénées réactives ou radicaux libres, éléments contribuant à l'instabilité génomique et au développement tumoral. Un premier axe de recherche m'a conduit à étudier une interaction potentielle entre des variants du génome mitochondrial et du génome nucléaire, en conjonction avec la consommation d'alcool. J'ai ensuite analysé les haplogroupes mitochondriaux peuvent être considérés en tant que potentiels modificateurs de l'association entre le risque de cancer du sein et les mutations causales portées par les gènes BRCA1 et BRCA2. L'haplogroupe T1a1 a été identifié comme modificateur du risque conféré par les mutations pathogènes localisées sur le gène BRCA2. Enfin, j'ai caractérisé par séquençage à haut débit le génome mitochondrial de 436 femmes ayant un cancer du sein et de forts antécédents familiaux, mais n'étant porteuses d'aucune mutation causale sur BRCA1 et BRCA2. Plusieurs variants ont été prédits comme dommageables. Deux gènes en particulier MT-ATP6 et MT-CYB, sont spécifiquement enrichis à la fois en nombre de variants portés, et de par le nombre d'individus porteurs de ces variants dans notre étude. L'ensemble du travail réalisé a ainsi contribué à enrichir les connaissances sur les potentielles associations entre les variations du génome mitochondrial et le risque du cancer du sei
    corecore