16 research outputs found

    Improving CNV detection from short-read MPS data in neuromuscular disorders

    Get PDF
    Neuromuscular disorders (NMD) are highly heterogenic with around 1000 reported different subtypes. Most are genetic in origin, and some 500 genes are currently identified to cause NMDs. Massively parallel sequencing (MPS) approaches have been widely used to increase the cost-effectiveness and diagnostic yield in the work-up of the genetic molecular diagnosis and to speed up the process. Copy number variants (CNVs), deletions and duplications larger than 50 base pairs, explain approximately 10% of the Mendelian disorders. No best practices pipelines have been developed yet for CNV analysis from MPS data. Therefore, the detection and verification of CNV findings has often involved complementary methods, such as array comparative genomic hybridization (array CGH), multiplex ligation-dependent probe amplification (MLPA) and quantitative PCR approaches. Recently, various CNV detection programs have been developed, but for widely different types of designated research settings, which complicates choosing the correct approach for NMDs. These individual programs have generally exhibited less than ideal sensitivity and specificity for CNV detection. Our aim was to develop a comprehensive pipeline for the detection and annotation of CNVs with high accuracy from targeted gene panel sequencing and whole exome sequencing (WES) data of patients with NMDs. Four different CNV analysis programs were chosen for this study: CoNIFER, XHMM, ExomeDepth and CODEX. The targeted gene panel MYOcap includes 349 genes for myopathic disorders and MNDcap 302 genes for neurogenic disorders in their current panel versions. 2359 samples were sequenced with MYOcap, 942 samples with MNDcap and 262 samples with WES. This included for the targeted gene panels 24 positive control samples with previously characterized CNVs and 31 negative control samples with certain genes verified to not have CNVs. A detection sensitivity of 100% and specificity of 100% were reached for these control samples. Previously undetected CNVs from MYOcap or MNDcap sequenced samples were verified as true positive detections in 36 cases with MLPA, PCR or array CGH, and eight CNVs were verified as false positive detections. These and the positive control samples were utilized in validation of a predictive logistic regression model. In silico CNV generation into MYOcap sequenced samples provided 18,677 specific and 3892 unspecific CNV detections to initially train the model. The model was trained to differentiate true positive detections from false positive detections in order to increase the specificity of the CNV detection pipeline. The advantage of using four different CNV detection programs compared to using them individually, or with any other combination, was demonstrated by CNV detection sensitivity from the set of in silico CNVs. The predictive model with variables from all four programs provided the highest sensitivity (96.6%) and specificity (87.5%) for predicting CNV detections correctly, indicating an accuracy of 95.5% (95% CI 87.3–99.1%). The CNV detection pipeline together with the predictive model was validated for WES samples with control samples with 235 previously characterized CNVs. For CNVs spanning at least three exons, the detection sensitivity was 97.3% and the sensitivity of the predicative model was 99.3% after adjusting the model threshold for WES data. The CNV annotation platform cnvScan was expanded to contain the most recent CNV population databases as well as in-house CNV databases for all the sequenced sample sets. CNV detection results were filtered by < 1% frequency with reciprocal overlap of 90% in the common CNV population databases, with both it and < 5% frequency with 50% reciprocal overlap in the in-house CNV database, and by the true positive prediction with the model. These procedures significantly decreased the workload (with 3–13% of the original CNV detections preserved) in evaluating the CNVs further regarding clinical significance. The added value, i.e. the additional diagnostic yield from CNVs for both the targeted gene panel sequenced samples and WES samples was estimated to be 1.9%. Altogether 39 final genetic diagnoses were solved with these CNV findings. In addition, 18 patient cases had a likely pathogenic finding, and five had a heterozygous CNV likely pathogenic for a recessive disease without association to the patient’s phenotype. The clarified cases included six different DMD deletions or duplications causing dystrophinopathies. In three sequenced familial cases, the detected CNVs in CACNA1A, SGCD and TTN genes co-segregated with the disease. One case had two separate genetic diseases, tibial muscular dystrophy (TMD) and BMD, caused by the founder mutation FINmaj in the gene TTN and a deletion in DMD. Some of the solved cases had novel findings: the second ever reported large intragenic deletion in NEB causing dominant disease, and the first CNV, an intragenic deletion, in TIA1 in a patient diagnosed with Welander distal myopathy (WDM). Some of the genes associated with NMDs are challenging to analyze from short-read sequencing data due to homology or repetitive regions. An additional script was thus written to differentiate copy numbers of the highly homologous genes, SMN1 and SMN2. Two SMN1/SMN2 copy number 0/3 control cases were successfully recognized, and five cases were identified with a possible exon 7 conversion in SMN1 and a compatible spinal muscular atrophy phenotype. The latter findings were considered likely pathogenic and are awaiting further validation on the genomic level. Comparison of CNV detections within the in-house CNV database revealed divergences in the CNV detections within the triplicate repetitive region of NEB with potentially clinically significant changes. One array CGH validated change correlated well with the nemaline rod pathology observed in the patient. CNV analysis utilizing MPS data from targeted gene panels and WES samples provided increased diagnostic yield as reported also in other studies on NMDs. Our multi-algorithm and -platform approach decreased the workload in variant analysis and provided more insight into the many difficult to analyze genomic regions involved in NMDs. In the future, whole genome sequencing and long-read sequencing will likely provide higher resolution for CNV detections and reveal an even wider spectrum of structural genomic variants, together with other emerging comprehensive methods, such as optical mapping.Lihastaudit ovat hyvin heterogeenisiä, ja niistä on kuvattu noin tuhat alatyyppiä. Suurin osa on perinnöllisiä tauteja, ja tähän mennessä on tunnistettu noin 500 eri lihastauteja aiheuttavaa geeniä. Massiivista rinnakkaissekvensointia (MPS) on käytetty laajalti perinnöllisten tautien diagnostisen prosessin nopeuttamiseksi, kustannustehokkuuden parantamiseksi ja lopullisen geeniperäisen diagnoosin saavuttamiseksi. Kopiolukumuutokset, yli 50 emäsparin deleetiot tai duplikaatiot, aiheuttavat arviolta 10 % Mendelin mukaisesti periytyvistä taudeista. Kopiolukumuutosten havaitsemiseen sekvensointidatasta ei ole vielä kehitetty yleisesti hyväksyttyjä ja suositeltuja käytänteitä. Kopiolukumuutosten havaitsemiseksi ja varmistamiseksi käytetäänkin usein täydentäviä menetelmiä, kuten vertaileva genominen hybridisaatio sirulla (aCGH), rinnastettu ligaatio-riippuvainen alukemonistus (MLPA) ja kvantitatiivinen PCR. Kopiolukumuutosten havaitsemiseen sekvensointidatasta on kehitetty useita työkaluja vaihtelevissa tutkimusasetelmissa, mikä hankaloittaa oikean lähestymistavan valitsemista lihastaudeille. Yksittäisten ohjelmien on todettu tuottavan usein epätäsmällisiä ja herkkyydeltään vaihtelevia tai riittämättömiä havaintoja. Tämän tutkimuksen tavoitteena oli kehittää kattava menetelmä kopiolukumuutosten havaitsemiseen ja annotointiin suurella tarkkuudella kohdennetun geenipaneelin ja koko eksomin (WES) sekvensointidatasta lihastautipotilailta. Tutkimukseen valittiin neljä kopiolukumuutosanalyysin työkalua: CoNIFER, XHMM, ExomeDepth ja CODEX. Kohdennetuista geenipaneeleista MYOcap kattaa 349 geeniä lihaspainotteisille taudeille ja MNDcap 302 hermopainotteisille taudeille nykyisissä paneeliversioissa. MYOcap:lla sekvensointiin 2359 näytettä, MNDcap:lla 942 ja WES:llä 262. Kohdennetuilla geenipaneeleilla sekvensointiin 24 positiivista kontrollinäytettä, joissa on aiemmin tunnistettu kopiolukumuutos, ja 31 negatiivista kontrollinäytettä, joissa tietyt geenit oli varmistettu kopiolukumuutoksia sisältämättömiksi. Kontrollinäytteille saavutettiin kehittämällämme menetelmällä 100 % havaitsemisherkkyys ja 100 % tarkkuus. MYOcap:lla tai MNDcap:lla sekvensoiduista näytteistä havaituista kopiolukumuutoksista 36 varmistettiin todellisiksi havainnoiksi MLPA:lla, PCR:lla tai aCGH:llä ja kahdeksan varmistettiin vääriksi positiivisiksi. Nämä ja positiiviset kontrollinäytteet sisällytettiin logistiseen regressioon perustuvan tilastollisen mallin validointiin. Erottelumallin kehitysvaiheessa MYOcap-sekvensoituihin näytteisiin tehtiin in silico kopiolukumuutoksia, mikä tuotti 18677 spesifiä ja 3892 ei-spesifiä kopiolukumuutoshavaintoa mallinnukseen. Malli kehitettiin erottelemaan todelliset kopiolukumuutoshavainnot vääristä positiivista havainnoista havaintomenetelmän tarkkuuden lisäämiseksi. Neljän ohjelman havaintojen käyttämisen paremmuus verrattuna ohjelmien käyttämiseen yksittäin tai muilla yhdistelmillä todennettiin in silico kopiolukumuutosten havaitsemisen herkkyyden tuloksilla. Erottelumalli, jossa oli muuttujia kaikilta neljältä ohjelmalta, saavutti korkeimman herkkyyden (96,6 %), täsmällisyyden (87,5 %) ja tarkkuuden 95,5 % (95 % CI 87,3–99,1 %) kopiolukumuutosten erottelulle. Kopiolukumuutoshavaitsemismenetelmä ja erottelumalli validoitiin WES-kontrollinäytteillä, joissa oli 235 aiemmin tunnistettua kopiolukumuutosta. Havaitsemisherkkyys kopiolukumuutoksille, jotka sisältävät vähintään kolme eksonia oli 97,3 %, ja erottelumallin herkkyys oli 99,3 % kunhan mallin arviointiraja oli uudelleensäädetty WES-datalle. Kopiolukumuutosten annotaatiotyökalu cnvScan laajennettiin sisältämään uusimmat kopiolukumuutospopulaatiotietokannat ja talonsisäinen kopiolukumuutostietokanta kaikista sekvensointinäytejoukoista. Alkuperäiset kopiolukumuutoshavainnot neljältä ohjelmalta suodatettiin 1 % enimmäisyleisyyden ja vastavuoroisen 90 % muutoksen kattamisen vaatimuksella yleisissä kopiolukumuutospopulaatiotietokannoissa, tällä sekä 5 % enimmäisyleisyyden ja vastavuoroisen 50 % muutoksen kattamisen vaatimuksella talonsisäisessä tietokannassa, ja lisäksi erottelumallilla todellisiin havaintoihin. Nämä toimenpiteet vähensivät merkittävästi työmäärää kliinisen merkityksen arvioinnille kopiolukumuutoksille säästäen 3–13 % alkuperäisistä havainnoista. Lisääntyneiden diagnoosien määrä kopiolukumuutoshavaintojen myötä sekä kohdennetuilla geenipaneeleilla että WES-sekvensoiduilla näytteillä oli noin 1,9 %. Kopiolukumuutoshavainnoilla saavutettiin 39 lopullista geneettistä diagnoosia potilaille. Lisäksi 18:lla tutkitulla oli todennäköisesti patogeeninen löydös, ja viidellä tutkitulla havaittiin heterotsygoottinen kopiolukumuutos, jonka arvioitiin olevan patogeeninen peittyvästi periytyvän taudin variantti ilman yhteyttä potilaan taudinkuvaan. Selvitettyihin tapauksiin sisältyi kuusi eri DMD-geenissä olevaa deleetiota tai duplikaatiota, jotka aiheuttivat dystrofinopatioita. Kolme potilasta, joilla oli oireisia perheenjäseniä, sekvensointiin perhetapauksina, ja havaitut kopiolukumuutokset geeneissä CACNA1A, SGCD ja TTN segregoituivat yhdessä taudin kanssa. Yhdellä tutkitulla havaittiin kaksi perinnöllistä tautia, tibiaalinen lihasdystrofia (TMD) ja BMD, joiden aiheuttajina olivat perustajamutaatio FINmaj TTN-geenissä ja deleetio DMD-geenissä. Osalla selvitetyistä tapauksista oli ennen havaitsemattomia löydöksiä: NEB-geenissä toinen koskaan raportoitu iso geeninsisäinen deleetio, joka aiheuttaa vallitsevasti periytyvän taudin, sekä TIA1-geenin geeninsisäinen deleetio, joka on ensimmäinen havaittu kopiolukumuutos TIA1:ssä Welanderin distaalimyopatiaa (WDM) sairastavalla potilaalla. Jotkin geeneistä, jotka on liitetty lihastauteihin, ovat haastavia analysoitavia lyhytlukuisesta sekvensointidatasta homologian ja toistojaksojen takia. Hyvin homologisille geeneille SMN1 ja SMN2 kehitettiin erillinen ohjelma erottelemaan geenien kopiolukumäärät. Kaksi kontrollitapausta tunnistettiin onnistuneesti SMN1 ja SMN2 kopiolukumäärillä 0 ja 3, ja lisäksi tunnistettiin viisi tapausta, joilla on mahdollisesti eksonin 7 konversio SMN1:ssä ja yhteensopiva spinaalinen lihasatrofia. Jälkimmäiset löydökset luokiteltiin todennäköisesti patogeeniseksi, ja ne odottavat genomista lisävarmistusta. Kopiolukumuutoshavaintojen vertailu NEB-geenin triplikaattitoistoalueella talonsisäisessä tietokannassa paljasti eroavaisuuksia, joilla on potentiaalisesti kliinisesti merkitystä. Yksi aCGH:llä varmistettu muutos korreloi selkeästi nemaliinisauvakappalepatologian kanssa, joka potilaalla oli havaittu. Kopiolukumuutoshavainnointi käyttäen sekvensointidataa kohdennetusta geenipaneelista tai WES-näytteistä lisäsi diagnoosien määrää kuten aiemmissa vastaavissa tutkimuksissa lihastaudeille. Käyttämämme usean algoritmin ja alustan lähestymistapa vähensi varianttianalyysin työmäärää ja tarjosi lisää tietoa useista hankalasti analysoitavista genomisista alueista, jotka on liitetty lihastauteihin. Tulevaisuudessa koko genomin sekvensointi ja pitkälukuinen sekvensointi tarjonnevat paremman resoluution kopiolukumuutoksille ja paljastavat enemmän rakenteellisia genomin muutoksia yhdessä muiden kehitteillä olevien kattavien menetelmien kanssa, kuten optinen kartoitus

    Purposive variation in recordkeeping in the academic molecular biology laboratory

    Get PDF
    This thesis presents an investigation into the role played by laboratory records in the disciplinary discourse of academic molecular biology laboratories. The motivation behind this study stems from two areas of concern. Firstly, the laboratory record has received comparatively little attention as a linguistic genre in spite of its central role in the daily work of laboratory scientists. Secondly, laboratory records have become a focus for technologically driven change through the advent of computing systems that aim to support a transition away from the traditional paper-based approach towards electronic recordkeeping. Electronic recordkeeping raises the potential for increased sharing of laboratory records across laboratory communities. However, the uptake of electronic laboratory notebooks has been, and remains, markedly low in academic laboratories. The investigation employs a multi-perspective research framework combining ethnography, genre analysis, and reading protocol analysis in order to evaluate both the organizational practices and linguistic practices at work in laboratory recordkeeping, and to examine these practices from the viewpoints of both producers and consumers of laboratory records. Particular emphasis is placed on assessing variation in the practices used by different scientists when keeping laboratory records, and on assessing the types of articulation work used to achieve mutual intelligibility across laboratory members. The findings of this investigation indicate that the dominant viewpoint held by laboratory staff other than principal investigators conceptualized laboratory records as a personal resource rather than a community archive. Readers other than the original author relied almost exclusively on the recontextualization of selected information from laboratory records into ‘public genres’ such as laboratory talks, research articles, and progress reports as the preferred means of accessing the information held in the records. The consistent use of summarized forms of recording experimental data rendered most laboratory records as both unreliable and of limited usability in the records management sense that they did not form full and accurate descriptions that could support future organizational activities. These findings offer a counterpoint to other studies, notably a number of studies undertaken as part of technology developments for electronic recordkeeping, that report sharing of laboratory records or assume a ‘cyberbolic’ view of laboratory records as a shared resource

    Drug development progress in duchenne muscular dystrophy

    Get PDF
    Duchenne muscular dystrophy (DMD) is a severe, progressive, and incurable X-linked disorder caused by mutations in the dystrophin gene. Patients with DMD have an absence of functional dystrophin protein, which results in chronic damage of muscle fibers during contraction, thus leading to deterioration of muscle quality and loss of muscle mass over time. Although there is currently no cure for DMD, improvements in treatment care and management could delay disease progression and improve quality of life, thereby prolonging life expectancy for these patients. Furthermore, active research efforts are ongoing to develop therapeutic strategies that target dystrophin deficiency, such as gene replacement therapies, exon skipping, and readthrough therapy, as well as strategies that target secondary pathology of DMD, such as novel anti-inflammatory compounds, myostatin inhibitors, and cardioprotective compounds. Furthermore, longitudinal modeling approaches have been used to characterize the progression of MRI and functional endpoints for predictive purposes to inform Go/No Go decisions in drug development. This review showcases approved drugs or drug candidates along their development paths and also provides information on primary endpoints and enrollment size of Ph2/3 and Ph3 trials in the DMD space

    The Shared Genetic Architecture of Modifiable Risk for Dementia and its Influence on Brain Health

    Get PDF
    Targeting modifiable risk factors for dementia may prevent or delay dementia. However, the mechanisms by which risk factors influence dementia remain unclear and current research often ignores commonality between risk factors. Therefore, my thesis aimed to model the shared genetic architecture of modifiable risk for dementia and explored how these shared pathways may influence dementia and brain health. I used linkage disequilibrium score regression and genomic structural equation modelling (SEM) to create a multivariate model of the shared genetics between Alzheimer’s disease (AD) and its modifiable risk factors. Although AD was genetically distinct, there was widespread genetic overlap between most of its risk factors. This genetic overlap formed an overarching Common Factor of general modifiable dementia risk, in addition to 3 subclusters of distinct sets of risk factors. Next, I performed two multivariate genome-wide association studies (GWASs) to identify the risk variants that underpinned the Common Factor and the 3 subclusters of risk factors. Together, these uncovered 590 genome-wide significant loci for the four latent factors, 34 of which were novel findings. Using post-GWAS analyses I found evidence that the shared genetics between risk factors influence a range of neuronal functions, which were highly expressed in brain regions that degenerate in dementia. Pathway analysis indicated that shared genetics between risk factors may impact dementia pathogenesis directly at specific loci. Finally, I used Mendelian randomisation to test whether the shared genetic pathways between modifiable dementia risk factors were causal for AD. I found evidence of a causal effect of the Common Factor on AD risk. Taken together, my thesis provides new insights into how modifiable risk factors for dementia interrelate on a genetic level. Although the shared genetics between modifiable risk factors for dementia seem to be distinct from dementia pathways on a genome-wide level, I provide evidence that they influence general brain health, and so they may increase dementia risk indirectly by altering cognitive reserve. However, I also found that shared genetics risk between risk factors in certain genomic regions may directly influence dementia pathogenesis, which should be explored in future work to determine whether these regions represent targets to prevent dementia

    Comparative analysis of germline and somatic micro-lesion mutational spectra in 17 human tumour suppressor genes

    Get PDF
    The known somatic (N>4000) and germline (N>4000) cancer-associated mutational spectra (viz. missense and nonsense mutations micro-deletions, micro-insertions and micro- indels &lt;20bp) of 17 human tumour suppressor genes (viz. APC, ATM, BRCA1, BRCA2, CDH1, CDKN2A, NF1, NF2, PTCH, PTEN, RBI, STK11, TP53, TSC1, TSC2, VHL and WT1) were compared in order to identify similarities and differences. Analysed parameters included the recurrence status of mutations, CpG mutability Grantham difference evolutionary conservation of affected codons role of nonsense-mediated mRNA decay and co-location with repetitive sequence elements. Only a small proportion of the mutations (-5%) were found to be shared between the germline and soma, although the proportions varied between different types of mutation (from 11% for missense mutations to 1% for micro-indels). Shared mutations are unlikely to be coincidental and are probably indicative of underlying shared (and endogenous) mutational mechanisms. Shared missense mutations were found to be more likely to be drivers of tumorigenesis than either exclusively somatic or exclusively germline missense mutations. Shared micro-lesions combined for all genes occurred disproportionately within repetitive elements by comparison with both somatic or germline micro-lesions, consistent with an endogenous mutational mechanism. For some genes (e.g. TP53), shared CpG-dinucleotide mutations evidenced the action of an endogenous mutational mechanism (viz. methylation-mediated deamination of 5-methylcytosine) in both the soma and the germline. Differences between mutational spectra were also noted. Germline missense mutations were found to be more likely to bear relatively more drastic functional consequences by comparison with somatic missense mutations, but also more likely to be truncating mutations. Germline micro-lesions (combined for all genes) were also found to be more likely to be co-located with repetitive elements than somatic micro-lesions. This could be due to the germline being relatively more protected from the action of exogenous mutagens by comparison to the soma. This study of 17 human tumour suppressor genes has therefore provided a first glimpse of the similarities and differences between germline and somatic mutational spectra

    Comparative analysis of germline and somatic micro-lesion mutational spectra in 17 human tumour suppressor genes

    Get PDF
    The known somatic (N>4000) and germline (N>4000) cancer-associated mutational spectra (viz. missense and nonsense mutations micro-deletions, micro-insertions and micro- indels &lt;20bp) of 17 human tumour suppressor genes (viz. APC, ATM, BRCA1, BRCA2, CDH1, CDKN2A, NF1, NF2, PTCH, PTEN, RBI, STK11, TP53, TSC1, TSC2, VHL and WT1) were compared in order to identify similarities and differences. Analysed parameters included the recurrence status of mutations, CpG mutability Grantham difference evolutionary conservation of affected codons role of nonsense-mediated mRNA decay and co-location with repetitive sequence elements. Only a small proportion of the mutations (-5%) were found to be shared between the germline and soma, although the proportions varied between different types of mutation (from 11% for missense mutations to 1% for micro-indels). Shared mutations are unlikely to be coincidental and are probably indicative of underlying shared (and endogenous) mutational mechanisms. Shared missense mutations were found to be more likely to be drivers of tumorigenesis than either exclusively somatic or exclusively germline missense mutations. Shared micro-lesions combined for all genes occurred disproportionately within repetitive elements by comparison with both somatic or germline micro-lesions, consistent with an endogenous mutational mechanism. For some genes (e.g. TP53), shared CpG-dinucleotide mutations evidenced the action of an endogenous mutational mechanism (viz. methylation-mediated deamination of 5-methylcytosine) in both the soma and the germline. Differences between mutational spectra were also noted. Germline missense mutations were found to be more likely to bear relatively more drastic functional consequences by comparison with somatic missense mutations, but also more likely to be truncating mutations. Germline micro-lesions (combined for all genes) were also found to be more likely to be co-located with repetitive elements than somatic micro-lesions. This could be due to the germline being relatively more protected from the action of exogenous mutagens by comparison to the soma. This study of 17 human tumour suppressor genes has therefore provided a first glimpse of the similarities and differences between germline and somatic mutational spectra.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore