1,511 research outputs found

    BASH: a tool for managing BeadArray spatial artefacts

    Get PDF
    Summary: With their many replicates and their random layouts, Illumina BeadArrays provide greater scope fordetecting spatial artefacts than do other microarray technologies. They are also robust to artefact exclusion, yet there is a lack of tools that can perform these tasks for Illumina. We present BASH, a tool for this purpose. BASH adopts the concepts of Harshlight, but implements them in a manner that utilizes the unique characteristics of the Illumina technology. Using bead-level data, spatial artefacts of various kinds can thus be identified and excluded from further analyses

    Spatial normalization improves the quality of genotype calling for Affymetrix SNP 6.0 arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms.</p> <p>Results</p> <p>Application of the spatial bias adjustments on HapMap samples resulted in higher genotype call rates with equal or even better accuracy for thousands of SNPs. Consequently the normalization procedure is expected to lead to more meaningful biological inferences and could be valuable for genome-wide SNP analysis.</p> <p>Conclusions</p> <p>Spatial normalization can potentially rescue thousands of SNPs in a genetic study at the small cost of computational time. The approach is implemented in R and available from the authors upon request.</p

    Systematic Spatial Bias in DNA Microarray Hybridization Is Caused by Probe Spot Position-Dependent Variability in Lateral Diffusion

    Get PDF
    Background The hybridization of nucleic acid targets with surface-immobilized probes is a widely used assay for the parallel detection of multiple targets in medical and biological research. Despite its widespread application, DNA microarray technology still suffers from several biases and lack of reproducibility, stemming in part from an incomplete understanding of the processes governing surface hybridization. In particular, non-random spatial variations within individual microarray hybridizations are often observed, but the mechanisms underpinning this positional bias remain incompletely explained. Methodology/Principal Findings This study identifies and rationalizes a systematic spatial bias in the intensity of surface hybridization, characterized by markedly increased signal intensity of spots located at the boundaries of the spotted areas of the microarray slide. Combining observations from a simplified single-probe block array format with predictions from a mathematical model, the mechanism responsible for this bias is found to be a position-dependent variation in lateral diffusion of target molecules. Numerical simulations reveal a strong influence of microarray well geometry on the spatial bias. Conclusions Reciprocal adjustment of the size of the microarray hybridization chamber to the area of surface-bound probes is a simple and effective measure to minimize or eliminate the diffusion-based bias, resulting in increased uniformity and accuracy of quantitative DNA microarray hybridization.Austrian Science Fund (P18836-B17)Austrian Science Fund (P20185-B17 )Austrian Science Fund (P16566-B14)Austria. Federal Ministry of Science and Research (GEN-AU III InflammoBiota)National Institutes of Health (U.S.) (1-R21-EB008844 to RS)National Science Foundation (U.S.) (OCE-0744641-CAREER

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF

    The application of genomic technologies to cancer and companion diagnostics.

    Get PDF
    This thesis describes work undertaken by the author between 1996 and 2014. Genomics is the study of the genome, although it is also often used as a catchall phrase and applied to the transcriptome (study of RNAs) and methylome (study of DNA methylation). As cancer is a disease of the genome the rapid advances in genomic technology, specifically microarrays and next generation sequencing, are creating a wave of change in our understanding of its molecular pathology. Molecular pathology and personalised medicine are being driven by discoveries in genomics, and genomics is being driven by the development of faster, better and cheaper genome sequencing. The next decade is likely to see significant changes in the way cancer is managed for individual cancer patients as next generation sequencing enters the clinic. In chapter 3 I discuss how ERBB2 amplification testing for breast cancer is currently dominated by immunohistochemistry (a single-gene test); and present the development, by the author, of a semi-quantitative PCR test for ERBB2 amplification. I also show that estimating ERBB2 amplification from microarray copy-number analysis of the genome is possible. In chapter 4 I present a review of microarray comparison studies, and outline the case for careful and considered comparison of technologies when selecting a platform for use in a research study. Similar, indeed more stringent, care needs to be applied when selecting a platform for use in a clinical test. In chapter 5 I present co-authored work on the development of amplicon and exome methods for the detection and quantitation of somatic mutations in circulating tumour DNA, and demonstrate the impact this can have in understanding tumour heterogeneity and evolution during treatment. I also demonstrate how next-generation sequencing technologies may allow multiple genetic abnormalities to be analysed in a single test, and in low cellularity tumours and/or heterogenous cancers. Keywords: Genome, exome, transcriptome, amplicon, next-generation sequencing, differential gene expression, RNA-seq, ChIP-seq, microarray, ERBB2, companion diagnostic

    The role of miRNA regulation in cancer progression and drug resistance

    Get PDF

    Improving Experimental Outcomes in Kinome Microarrays Through Quality Control

    Get PDF
    Peptide microarrays consisting of defined phosphorylation target sites are an effective approach for highthroughput analysis of cellular kinase (kinome) activity. Kinome peptide arrays are highly customizable and do not require species-specific reagents to measure kinase activity, making them amenable for kinome analysis in any species. However, the data emerging from experiments with kinome peptide arrays exhibit a large amount of variability. To mitigate this issue, we introduce PIIKA 2.5 to expand upon existing software by providing three important quality control features in an aim to increase the accuracy and consistency of kinome results, which often suffer due to the aforementioned variability. The first feature concerns the size of the virtual circle drawn around each probe in microarray analysis software (spot size). This circle creates the boundary between pixels interpreted as foreground signal and pixels interpreted as background signal. In this thesis, it is shown that too large of a spot size creates abnormal data characteristics, such as high skewness (the asymmetry of the distribution of the data), that can alter downstream results. Here, a feature is presented that alerts users to the existence of improper spot size and informs them of the need to perform a manual alignment to enhance the quality of the raw intensity data, based on the skewness of the data as determined by examination of the mean and median of each dataset. The second feature uses interarray comparisons to identify outlier arrays that sometimes emerge as a consequence of technical or unknown issues. The work shown in this thesis indicates that the removal of said outlier arrays improves downstream analysis and interpretation. The third feature is a new background correction method, background scaling. Here, it is demonstrated to sharply reduce spatial biases in comparison to the most popular background correction method, background subtraction. Collectively, the modifications presented in PIIKA 2.5 allow users to identify low-quality data, improve clustering of treatment groups, reduce unintended effects, and enhance reproducibility in kinome analysis. The web-based and stand-alone versions of PIIKA 2.5 are freely accessible at http://saphire.usask.ca/saphire/piika
    • …
    corecore