681 research outputs found

    Auto-validation of fluorescent primer extension genotyping assay using signal clustering and neural networks

    Get PDF
    BACKGROUND: SNP genotyping typically incorporates a review step to ensure that the genotype calls for a particular SNP are correct. For high-throughput genotyping, such as that provided by the GenomeLab SNPstream(® )instrument from Beckman Coulter, Inc., the manual review used for low-volume genotyping becomes a major bottleneck. The work reported here describes the application of a neural network to automate the review of results. RESULTS: We describe an approach to reviewing the quality of primer extension 2-color fluorescent reactions by clustering optical signals obtained from multiple samples and a single reaction set-up. The method evaluates the quality of the signal clusters from the genotyping results. We developed 64 scores to measure the geometry and position of the signal clusters. The expected signal distribution was represented by a distribution of a 64-component parametric vector obtained by training the two-layer neural network onto a set of 10,968 manually reviewed 2D plots containing the signal clusters. CONCLUSION: The neural network approach described in this paper may be used with results from the GenomeLab SNPstream instrument for high-throughput SNP genotyping. The overall correlation with manual revision was 0.844. The approach can be applied to a quality review of results from other high-throughput fluorescent-based biochemical assays in a high-throughput mode

    ENHANCing the limb: from micro- to macro-evolution

    Get PDF
    Abstract Understanding the molecular basis of the diverse morphological forms found within and across species is a longstanding goal in evolutionary biology. One especially relevant class of cis-regulatory elements are enhancers. This is because mutations affecting enhancers tend to be tissue- or stage-specific, which allows adaptation to proceed with relatively less harmful side effects in other organs or tissues. In Chapter 2 I explore how enhancers help drive morphological selection response within-species. We scanned the genomes of the Longshanks mice, which are mice selectively bred over 20 generations, for a 13% increase in tibiae. Against a backdrop of polygenic response, we found the bone repressor Nkx3-2, and specifically its enhancers, to be among the strongest contributor towards increased tibia length. I used transgenics to compare the enhancer activity of the F0 and F17 alleles at 3 candidate enhancers (two near the Nkx3-2 gene; and one near the limb developmental regulator gene, Gli3). We found that both loss-of-function (Nkx3-2) and gain-of-function (Gli3) alleles contributed to the selection response. In Chapter 3, we explored an approach to study macro-evolutionary variations across species. One of the major barriers to such study is the inability to perform direct genetic crosses due to hybrid sterility. We tackle the species barrier problem by inducing mitotic recombination in vitro in hybrid embryonic stem cells (including cross-species hybrids between Mus musculus and Mus spretus). This was achieved via Blm inhibition by the small molecule ML216. We further show, that the resultant mitotic recombinant cells can be used for genetic mapping by connecting tioguanine drug resistance to variations at the Hprt locus. Furthermore, in vitro recombinant stem cells can be used for rederivation of animals through laser-assisted morula injection, thus allowing the acquisition of morphological data. Here, through a multidisciplinary approach, we show that enhancer modulation contributes to morphological diversity and selection response within-species and provide a new methodology for enhancer study across-species, thus enabling the study of evolutionary developmental variations in genetic backgrounds that would otherwise be challenging to obtain. Overall, these studies highlight the relevance of enhancers in morphological diversification and provide new tools for their study

    The art of PCR assay development: data-driven multiplexing

    Get PDF
    The present thesis describes the discovery and application of a novel methodology, named Data-Driven Multiplexing, which uses artificial intelligence and conventional molecular instruments to develop rapid, scalable and cost-effective clinical diagnostic tests. Detection of genetic material from living organisms is a biologically engineered process where organic molecules interact with each other and with chemical components to generate a meaningful signal of the presence, quantity or quality of target nucleic acids. Nucleic acid detection, such as DNA or RNA detection, identifies a specific organism based on its genetic material. In particular, DNA amplification approaches, such as for antimicrobial resistance (AMR) or COVID-19 detection, are crucial for diagnosing and managing various infectious diseases. One of the most widely used methods is Polymerase Chain Reaction (PCR), which can detect the presence of nucleic acids rapidly and accurately. The unique interaction of the genetic material and synthetic short DNA sequences called primers enable this harmonious biological process. This thesis aims to bioinformatically modulate the interaction between primers and genetic material, enhancing the diagnostic capabilities of conventional PCR instruments by applying artificial intelligence processing to the resulting signals. To achieve the goal mentioned above, experiments and data from several conventional platforms, such as real-time and digital PCR, are used in this thesis, along with state-of-the-art and innovative algorithms for classification problems and final application in real-world clinical scenarios. This work exhibits a powerful technology to optimise the use of the data, conveying the following message: the better use of the data in clinical diagnostics enables higher throughput of conventional instruments without the need for hardware modification, maintaining the standard practice workflows. In Part I, a novel method to analyse amplification data is proposed. Using a state-of-the-art digital PCR instrument and multiplex PCR assays, we demonstrate the simultaneous detection of up to nine different nucleic acids in a single-well and single-channel format. This novel concept called Amplification Curve Analysis (ACA) leverages kinetic information encoded in the amplification curve to classify the biological nature of the target of interest. This method is applied to the novel design of PCR assays for multiple detections of AMR genes and further validated with clinical samples collected at Charing Cross Hospital, London, UK. The ACA showed a high classification accuracy of 99.28% among 253 clinical isolates when multiplexing. Similar performance is also demonstrated with isothermal amplification chemistries using synthetic DNA, showing a 99.9% of classification accuracy for detecting respiratory-related infectious pathogens. In Part II, two intelligent mathematical algorithms are proposed to solve two significant challenges when developing a Data-driven multiplex PCR assay. Chapter 7 illustrates the use of filtering algorithms to remove the presence of outliers in the amplification data. This demonstrates that the information contained in the kinetics of the reaction itself provides a novel way to remove non-specific and not efficient reactions. By extracting meaningful features and adding custom selection parameters to the amplification data, we increase the machine learning classifier performance of the ACA by 20% when outliers are removed. In Chapter 8, a patented algorithm called Smart-Plexer is presented. This allows the hybrid development of multiplex PCR assays by computing the optimal single primer set combination in a multiplex assay. The algorithm's effectiveness stands in using experimental laboratory data as input, avoiding heavy computation and unreliable predictions of the sigmoidal shape of PCR curves. The output of the Smart-Plexer is an optimal assay for the simultaneous detection of seven coronavirus-related pathogens in a single well, scoring an accuracy of 98.8% in identifying the seven targets correctly among 14 clinical samples. Moreover, Chapter 9 focuses on applying novel multiplex assays in point-of-care devices and developing a new strategy for improving clinical diagnostics. In summary, inspired by the emerging requirement for more accurate, cost-effective and higher throughput diagnostics, this thesis shows that coupling artificial intelligence with assay design pipelines is crucial to address current diagnostic challenges. This requires crossing different fields, such as bioinformatics, molecular biology and data science, to develop an optimal solution and hence to maximise the value of clinical tests for nucleic acid detection, leading to more precise patient treatment and easier management of infectious control.Open Acces

    Recent trends in molecular diagnostics of yeast infections : from PCR to NGS

    Get PDF
    The incidence of opportunistic yeast infections in humans has been increasing over recent years. These infections are difficult to treat and diagnose, in part due to the large number and broad diversity of species that can underlie the infection. In addition, resistance to one or several antifungal drugs in infecting strains is increasingly being reported, severely limiting therapeutic options and showcasing the need for rapid detection of the infecting agent and its drug susceptibility profile. Current methods for species and resistance identification lack satisfactory sensitivity and specificity, and often require prior culturing of the infecting agent, which delays diagnosis. Recently developed high-throughput technologies such as next generation sequencing or proteomics are opening completely new avenues for more sensitive, accurate and fast diagnosis of yeast pathogens. These approaches are the focus of intensive research, but translation into the clinics requires overcoming important challenges. In this review, we provide an overview of existing and recently emerged approaches that can be used in the identification of yeast pathogens and their drug resistance profiles. Throughout the text we highlight the advantages and disadvantages of each methodology and discuss the most promising developments in their path from bench to bedside

    ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.</p> <p>Methods</p> <p>Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets.</p> <p>Results</p> <p>We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.</p> <p>Conclusions</p> <p>We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p

    SMAD4: a multifunctional regulator of limb bud initiation and outgrowth

    Get PDF
    During mouse embryonic development, the spatio-temporal expression of genes is controlled by both interlinked signalling pathways and interactions between transcription factors and their target cis-regulatory modules. To gain global insights into the roles of a trans-acting transcriptional regulator in a specific tissue, the genome-wide profiling of its target regulatory regions and their association with the putative target genes are essential. Therefore, I have combined several types of genome-wide analyses such as ChIP-seq using epitope-tagged transcription factors with ATAC-seq and RNA-seq to study the functions of HAND2 and SMAD4 during heart and limb bud development, respectively. In Hand2-deficient embryos, we observed that cells of the atrioventricular canal do not undergo the endothelial-mesenchymal transition that underlies cardiac cushion development. By combining HAND23xF ChIP-seq and RNA-seq analysis, we have identified the HAND2 gene regulatory network involved in these processes and show that HAND2 is a key regulator of heart valve development. Limb bud outgrowth and patterning are regulated by a self-regulatory feedback signalling system operating between the SHH and FGF signalling pathways that critically depends on the BMP antagonist GREMLIN1. However, the establishment of these signalling feedback loops requires initiation of Gremlin1 expression by high BMP activity. For my PhD research, I have investigated the roles of the BMP signalling pathway during limb bud initiation by studying the functions of the BMP signal transducer SMAD4. By combining genome-wide SMAD43xF ChIP-seq, ATAC-seq and RNA-seq analyses, I am able to show that SMAD4 participates in activation of Gremlin1 expression by interacting with Grem1 coding exon 2 (a putative regulatory region). Furthermore, the identification of the SMAD4 gene regulatory network reveals multiple functions of SMAD4 during the onset of limb bud development. Especially, SMAD4 directly regulates target genes involved in limb bud outgrowth and patterning. Rather unexpected, my analysis reveals that SMAD4 directly regulates cholesterol homeostasis and controls the gradient and activity of the SHH signalling pathway during early limb bud development

    Unravelling cylindromas : a molecular dissection of CYLD defective tumours

    Get PDF
    Ph. D.Patients with germline mutations in the tumour suppressor gene CYLD develop multiple cutaneous tumours on the head and neck; historically this has been termed “turban tumour” syndrome. Cylindromas and spiradenomas, hair follicle related tumours seen in this syndrome, cause significant clinical morbidity. Here we characterise the clinical phenotype of these patients, utilising tumour mapping to determine the location of tumours in mutation carriers from two large pedigrees. We demonstrate the disease often affects sites outwith the head and neck, and that androgen stimulated hair follicles are particularly vulnerable to tumour formation. The impact of this disease is severe, with 1 in 4 carriers of this gene undergoing complete scalp removal. To improve this outcome, we performed whole genome profiling of CYLD defective tumours, characterising genomic and transcriptomic changes to determine targetable signalling pathways. High resolution analysis using whole genome array based comparative genomic hybridisation and single nucleotide polymorphism analysis suggest that loss of heterozygosity at the CYLD locus may be the only change required for tumour phenotype. Gene expression profiling highlighted transcriptomic similarity between cylindromas and spiradenomas. Threedimensional reconstruction in silico from serial sections of tumours demonstrated contiguous growth between cylindromas and spiradenomas, in support of this finding. In both tumour types, dysregulated tropomyosin receptor kinase (TRK) signalling was found. Consistent with this, was the finding that TRKB and TRKC protein was overexpressed selectively in the tumour samples, demonstrated on a tissue microarray. Therapeutic utility of targeting this pathway was demonstrated by reduced viability of CYLD defective primary cell cultures in the presence of TRK inhibitors. These preliminary data support the use of TRK inhibitors as a therapeutic strategy in severely affected CYLD mutation carriers.North East Skin Research fund, The Newcastle Hospital Trustees, Breakthrough Breast Cancer Research, The Medical Research Counci

    Identification of 31 genomic loci for autosomal recessive mental retardation and molecular genetic characterization of novel causative mutations in four genes.

    Get PDF
    Severe mental and behavioral disorders are common, affecting 1-3% of the world populace. They thus constitute a major burden not only for the affected families but also for society. There is reason to believe that autosomal recessive mental retardation (ARMR) is more common than X-linked MR, but it has so far received considerably less attention. This is partly due to small family sizes and low consanguinity rates in industrialized societies, both of which have hampered gene mapping and identification, which is illustrated by the fact that until 2003, when this study was started, no more than one gene was shown to be implicated in non-syndromic ARMR (NS-ARMR). The work presented here is part of a larger project to shed more light on the molecular causes of ARMR as a prerequisite for diagnosis, counselling and therapy, focusing on large consanguineous Iranian families with several mentally retarded children. It combines clinical and molecular approaches such as patient recruitment, clinical characterization, sample collection, SNP array genotyping, whole genome linkage analysis, homozygosity mapping and finally mutation screening in a systematic fashion. Successful mutation detection is followed by functional analyses of the affected genes. In the study presented here, the investigation of 135 families led to the identification of 31 novel genomic loci for ARMR. Contrary to previous observations, which prima facie argued against the existence of frequently mutated genes, overlapping autozygosity regions from several families could now be observed on chromosomes 1, 5 and 19. At each of these loci a minimum of two overlapping linkage intervals were solitary in the respective families and showed a LOD score of, or above, three. Mutation screening in one of these families with NS-ARMR has led to the discovery of a new gene for NS-ARMR, TUSC3, where a mutation was found that leads to the loss of TUSC3 transcript in patient cells. Additional investigations in families with syndromic forms of ARMR revealed a new gene for ataxia and mild mental retardation. This gene, CA8, was found to carry a R237Q mutation, with a putatively deleterious effect on functional properties of the gene product in the affected patients. Furthermore one novel mutation in ALDH3A2 in patients with Sjögren-Larsson syndrome and two in the MCPH1 gene in patients with primary microcephaly were found. Gene expression profiling, knockdown experiments and irradiation studies added more evidence on the involvement of MCPH1 in cell cycle control, DNA damage response and transcriptional regulation. In summary, the identification of a novel gene for NS-ARMR and many new genomic intervals with a high probability for containing different genes with disease causing mutations is in keeping with previous results that indicated a high degree of genetic heterogeneity for this disorder. Still, the several overlapping loci found in this study now also indicate the presence of genes with an increased frequency of mutations in ARMR patients. Further studies are necessary to identify the disease causing mutations in these newly identified linkage intervals and to determine the contribution of the affected genes to the complex processes of human cognition. These studies will be greatly facilitated by the novel high throughput sequencing technologies, which are now available and that will allow a much increased pace for the detection of disease causing mutations

    Biological Role and Disease Impact of Copy Number Variation in Complex Disease

    Get PDF
    In the human genome, DNA variants give rise to a variety of complex phenotypes. Ranging from single base mutations to copy number variations (CNVs), many of these variants are neutral in selection and disease etiology, making difficult the detection of true common or rare frequency disease-causing mutations. However, allele frequency comparisons in cases, controls, and families may reveal disease associations. Single nucleotide polymorphism (SNP) arrays and exome sequencing are popular assays for genome-wide variant identification. To limit bias between samples, uniform testing is crucial, including standardized platform versions and sample processing. Bases occupy single points while copy variants occupy segments. Bases are bi-allelic while copies are multi-allelic. One genome also encodes many different cell types. In this study, we investigate how CNV impacts different cell types, including heart, brain and blood cells, all of which serve as models of complex disease. Here, we describe ParseCNV, a systematic algorithm specifically developed as a part of this project to perform more accurate disease associations using SNP arrays or exome sequencing-generated CNV calls with quality tracking of variants, contributing to each significant overlap signal. Red flags of variant quality, genomic region, and overlap profile are assessed in a continuous score and shown to correlate over 90% with independent verification methods. We compared these data with our large internal cohort of 68,000 subjects, with carefully mapped CNVs, which gave a robust rare variant frequency in unaffected populations. In these investigations, we uncovered a number of loci in which CNVs are significantly enriched in non-coding RNA (ncRNA), Online Mendelian Inheritance in Man (OMIM), and genome-wide association study (GWAS) regions, impacting complex disease. By evaluating thoroughly the variant frequencies in pediatric individuals, we subsequently compared these frequencies in geriatric individuals to gain insight of these variants\u27 impact on lifespan. Longevity-associated CNVs enriched in pediatric patients were found to aggregate in alternative splicing genes. Congenital heart disease is the most common birth defect and cause of infant mortality. When comparing congenital heart disease families, with cases and controls genotyped both on SNP arrays and exome sequencing, we uncovered significant and confident loci that provide insight into the molecular basis of disease. Neurodevelopmental disease affects the quality of life and cognitive potential of many children. In the neurodevelopmental and psychiatric diseases, CACNA, GRM, CNTN, and SLIT gene families show multiple significant signals impacting a large number of developmental and psychiatric disease traits, with the potential of informing therapeutic decision-making. Through new tool development and analysis of large disease cohorts genotyped on a variety of assays, I have uncovered an important biological role and disease impact of CNV in complex disease
    • …
    corecore