90 research outputs found

    Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.</p> <p>Results</p> <p>Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.</p> <p>Conclusion</p> <p>Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.</p

    Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA.</p> <p>Results</p> <p>A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content.</p> <p>Conclusions</p> <p>The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.</p

    Integrative functional genomic analysis of human brain development and neuropsychiatric risks

    Get PDF
    To broaden our understanding of human neurodevelopment, we profiled transcriptomic and epigenomic landscapes across brain regions and/or cell types for the entire span of prenatal and postnatal development. Integrative analysis revealed temporal, regional, sex, and cell type-specific dynamics.We observed a global transcriptomic cup-shaped pattern, characterized by a late fetal transition associated with sharply decreased regional differences and changes in cellular composition and maturation, followed by a reversal in childhood-adolescence, and accompanied by epigenomic reorganizations. Analysis of gene coexpression modules revealed relationships with epigenomic regulation and neurodevelopmental processes. Genes with genetic associations to brain-based traits and neuropsychiatric disorders (including MEF2C, SATB2, SOX5, TCF4, and TSHZ3) converged in a small number of modules and distinct cell types, revealing insights into neurodevelopment and the genomic basis of neuropsychiatric risks

    BeadArray Expression Analysis Using Bioconductor

    Get PDF
    Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered

    Integrative functional genomic analysis of human brain development and neuropsychiatric risks

    Get PDF
    To broaden our understanding of human neurodevelopment, we profiled transcriptomic and epigenomic landscapes across brain regions and/or cell types for the entire span of prenatal and postnatal development. Integrative analysis revealed temporal, regional, sex, and cell type-specific dynamics.We observed a global transcriptomic cup-shaped pattern, characterized by a late fetal transition associated with sharply decreased regional differences and changes in cellular composition and maturation, followed by a reversal in childhood-adolescence, and accompanied by epigenomic reorganizations. Analysis of gene coexpression modules revealed relationships with epigenomic regulation and neurodevelopmental processes. Genes with genetic associations to brain-based traits and neuropsychiatric disorders (including MEF2C, SATB2, SOX5, TCF4, and TSHZ3) converged in a small number of modules and distinct cell types, revealing insights into neurodevelopment and the genomic basis of neuropsychiatric risks

    Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

    Get PDF
    Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

    Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data

    Get PDF
    Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. As applications, we model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We also combine our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations to accurately predict the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).Comment: 17 pages, 4 figures, supporting information included with sourc

    Age related decline in female lar gibbon great call performance suggests that call features correlate with physical condition

    Get PDF
    Background: White-handed gibbons (Hylobates lar) are small Asian apes known for living in stable territories and producing loud, elaborate vocalizations (songs), often in well-coordinated male/female duets. The female great call, the most conspicuous phrase of the repertoire, has been hypothesized to function in intra-sexual territorial defense. We therefore predicted that characteristics of the great call would correlate with a caller’s physical condition, and thus might honestly reflect resource holding potential (RHP). Because measurement of RHP is virtually impossible for wild animals, we used age as a proxy, hypothesizing that great call climaxes are difficult to produce and maintain over time, and that older adults will therefore perform lower quality great calls than young adults. To test this we analyzed the great call climaxes of 15 wild lar gibbon females at Khao Yai National Park, Thailand and 2 captive females at Leo Conservation Center, Greenwich, CT. Results: Findings show that call climaxes correlate with female age, as young animals (n = 8, mean age: 12.9 years) produced climaxes with a higher frequency range (delta F0), maximum F0 frequency and duty cycle than old animals (n = 9, mean age: 29.6 years). A permuted discriminant function analysis also correctly classified calls by age group. During long song bouts the maximum F0 frequency of great call climaxes’ also decreased. Additional data support the hypothesis that short high notes, associated with rapid inhalation as an individual catches its breath, reflect increased caller effort. Older females produced more high notes than younger females, but the difference only approached statistical significance, suggesting that calling effort may be similar across different ages. Finally, for the first time in this species, we measured peak intensity of calls in captive females. They were capable of producing climaxes in excess of 100 dB at close range (2.7 m). Conclusions: Age and within-bout differences in the lar gibbon great call climax suggest that call features correlate with physical condition and thus the call may have evolved as an honest signal in the context of intra-sexual territorial defense and possibly also in male mate choice via sexual selection, although further testing of these hypotheses is necessary. Results: Findings show that call climaxes correlate with female age, as young animals (n = 8, mean age: 12.9 years) produced climaxes with a higher frequency range (delta F0), maximum F0 frequency and duty cycle than old animals (n = 9, mean age: 29.6 years). A permuted discriminant function analysis also correctly classified calls by age group. During long song bouts the maximum F0 frequency of great call climaxes’ also decreased. Additional data support the hypothesis that short high notes, associated with rapid inhalation as an individual catches its breath, reflect increased caller effort. Older females produced more high notes than younger females, but the difference only approached statistical significance, suggesting that calling effort may be similar across different ages. Finally, for the first time in this species, we measured peak intensity of calls in captive females. They were capable of producing climaxes in excess of 100 dB at close range (2.7 m). Conclusions: Age and within-bout differences in the lar gibbon great call climax suggest that call features correlate with physical condition and thus the call may have evolved as an honest signal in the context of intra-sexual territorial defense and possibly also in male mate choice via sexual selection, although further testing of these hypotheses is necessary
    • …
    corecore