80 research outputs found

    Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.</p> <p>Results</p> <p>Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.</p> <p>Conclusion</p> <p>Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.</p

    Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA.</p> <p>Results</p> <p>A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content.</p> <p>Conclusions</p> <p>The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.</p

    Integrative functional genomic analysis of human brain development and neuropsychiatric risks

    Get PDF
    To broaden our understanding of human neurodevelopment, we profiled transcriptomic and epigenomic landscapes across brain regions and/or cell types for the entire span of prenatal and postnatal development. Integrative analysis revealed temporal, regional, sex, and cell type-specific dynamics.We observed a global transcriptomic cup-shaped pattern, characterized by a late fetal transition associated with sharply decreased regional differences and changes in cellular composition and maturation, followed by a reversal in childhood-adolescence, and accompanied by epigenomic reorganizations. Analysis of gene coexpression modules revealed relationships with epigenomic regulation and neurodevelopmental processes. Genes with genetic associations to brain-based traits and neuropsychiatric disorders (including MEF2C, SATB2, SOX5, TCF4, and TSHZ3) converged in a small number of modules and distinct cell types, revealing insights into neurodevelopment and the genomic basis of neuropsychiatric risks

    BeadArray Expression Analysis Using Bioconductor

    Get PDF
    Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered

    Integrative functional genomic analysis of human brain development and neuropsychiatric risks

    Get PDF
    To broaden our understanding of human neurodevelopment, we profiled transcriptomic and epigenomic landscapes across brain regions and/or cell types for the entire span of prenatal and postnatal development. Integrative analysis revealed temporal, regional, sex, and cell type-specific dynamics.We observed a global transcriptomic cup-shaped pattern, characterized by a late fetal transition associated with sharply decreased regional differences and changes in cellular composition and maturation, followed by a reversal in childhood-adolescence, and accompanied by epigenomic reorganizations. Analysis of gene coexpression modules revealed relationships with epigenomic regulation and neurodevelopmental processes. Genes with genetic associations to brain-based traits and neuropsychiatric disorders (including MEF2C, SATB2, SOX5, TCF4, and TSHZ3) converged in a small number of modules and distinct cell types, revealing insights into neurodevelopment and the genomic basis of neuropsychiatric risks

    Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

    Get PDF
    Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

    Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data

    Get PDF
    Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. As applications, we model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We also combine our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations to accurately predict the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).Comment: 17 pages, 4 figures, supporting information included with sourc

    Age related decline in female lar gibbon great call performance suggests that call features correlate with physical condition

    Get PDF
    Background: White-handed gibbons (Hylobates lar) are small Asian apes known for living in stable territories and producing loud, elaborate vocalizations (songs), often in well-coordinated male/female duets. The female great call, the most conspicuous phrase of the repertoire, has been hypothesized to function in intra-sexual territorial defense. We therefore predicted that characteristics of the great call would correlate with a caller’s physical condition, and thus might honestly reflect resource holding potential (RHP). Because measurement of RHP is virtually impossible for wild animals, we used age as a proxy, hypothesizing that great call climaxes are difficult to produce and maintain over time, and that older adults will therefore perform lower quality great calls than young adults. To test this we analyzed the great call climaxes of 15 wild lar gibbon females at Khao Yai National Park, Thailand and 2 captive females at Leo Conservation Center, Greenwich, CT. Results: Findings show that call climaxes correlate with female age, as young animals (n = 8, mean age: 12.9 years) produced climaxes with a higher frequency range (delta F0), maximum F0 frequency and duty cycle than old animals (n = 9, mean age: 29.6 years). A permuted discriminant function analysis also correctly classified calls by age group. During long song bouts the maximum F0 frequency of great call climaxes’ also decreased. Additional data support the hypothesis that short high notes, associated with rapid inhalation as an individual catches its breath, reflect increased caller effort. Older females produced more high notes than younger females, but the difference only approached statistical significance, suggesting that calling effort may be similar across different ages. Finally, for the first time in this species, we measured peak intensity of calls in captive females. They were capable of producing climaxes in excess of 100 dB at close range (2.7 m). Conclusions: Age and within-bout differences in the lar gibbon great call climax suggest that call features correlate with physical condition and thus the call may have evolved as an honest signal in the context of intra-sexual territorial defense and possibly also in male mate choice via sexual selection, although further testing of these hypotheses is necessary. Results: Findings show that call climaxes correlate with female age, as young animals (n = 8, mean age: 12.9 years) produced climaxes with a higher frequency range (delta F0), maximum F0 frequency and duty cycle than old animals (n = 9, mean age: 29.6 years). A permuted discriminant function analysis also correctly classified calls by age group. During long song bouts the maximum F0 frequency of great call climaxes’ also decreased. Additional data support the hypothesis that short high notes, associated with rapid inhalation as an individual catches its breath, reflect increased caller effort. Older females produced more high notes than younger females, but the difference only approached statistical significance, suggesting that calling effort may be similar across different ages. Finally, for the first time in this species, we measured peak intensity of calls in captive females. They were capable of producing climaxes in excess of 100 dB at close range (2.7 m). Conclusions: Age and within-bout differences in the lar gibbon great call climax suggest that call features correlate with physical condition and thus the call may have evolved as an honest signal in the context of intra-sexual territorial defense and possibly also in male mate choice via sexual selection, although further testing of these hypotheses is necessary

    Quantum simulation of the Hubbard model with dopant atoms in silicon

    Get PDF
    In quantum simulation, many-body phenomena are probed in controllable quantum systems. Recently, simulation of Bose-Hubbard Hamiltonians using cold atoms revealed previously hidden local correlations. However, fermionic many-body Hubbard phenomena such as unconventional superconductivity and spin liquids are more difficult to simulate using cold atoms. To date the required single-site measurements and cooling remain problematic, while only ensemble measurements have been achieved. Here we simulate a two-site Hubbard Hamiltonian at low effective temperatures with single-site resolution using subsurface dopants in silicon. We measure quasiparticle tunneling maps of spin-resolved states with atomic resolution, finding interference processes from which the entanglement entropy and Hubbard interactions are quantified. Entanglement, determined by spin and orbital degrees of freedom, increases with increasing covalent bond length. We find separation-tunable Hubbard interaction strengths that are suitable for simulating strongly correlated phenomena in larger arrays of dopants, establishing dopants as a platform for quantum simulation of the Hubbard model.Comment: 6 pages, 5 figures. Supplementary: 13 pages, 7 figures. New version with some additional discussion, accepted in Nature Communication

    Tumour sampling method can significantly influence gene expression profiles derived from neoadjuvant window studies

    Get PDF
    Patient-matched transcriptomic studies using tumour samples before and after treatment allow inter-patient heterogeneity to be controlled, but tend not to include an untreated comparison. Here, Illumina BeadArray technology was used to measure dynamic changes in gene expression from thirty-seven paired diagnostic core and surgically excised breast cancer biopsies obtained from women receiving no treatment prior to surgery, to determine the impact of sampling method and tumour heterogeneity. Despite a lack of treatment and perhaps surprisingly, consistent changes in gene expression were identified during the diagnosis-surgery interval (48 up, 2 down; Siggenes FDR 0.05) in a manner independent of both subtype and sampling-interval length. Instead, tumour sampling method was seen to directly impact gene expression, with similar effects additionally identified in six published breast cancer datasets. In contrast with previous findings, our data does not support the concept of a significant wounding or immune response following biopsy in the absence of treatment and instead implicates a hypoxic response following the surgical biopsy. Whilst sampling-related gene expression changes are evident in treated samples, they are secondary to those associated with response to treatment. Nonetheless, sampling method remains a potential confounding factor for neoadjuvant study design
    • …
    corecore