157 research outputs found

    R-Gada: a fast and flexible pipeline for copy number analysis in association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p

    svdPPCS: an effective singular value decomposition-based method for conserved and divergent co-expression gene module identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative analysis of gene expression profiling of multiple biological categories, such as different species of organisms or different kinds of tissue, promises to enhance the fundamental understanding of the universality as well as the specialization of mechanisms and related biological themes. Grouping genes with a similar expression pattern or exhibiting co-expression together is a starting point in understanding and analyzing gene expression data. In recent literature, gene module level analysis is advocated in order to understand biological network design and system behaviors in disease and life processes; however, practical difficulties often lie in the implementation of existing methods.</p> <p>Results</p> <p>Using the singular value decomposition (SVD) technique, we developed a new computational tool, named svdPPCS (<b>SVD</b>-based <b>P</b>attern <b>P</b>airing and <b>C</b>hart <b>S</b>plitting), to identify conserved and divergent co-expression modules of two sets of microarray experiments. In the proposed methods, gene modules are identified by splitting the two-way chart coordinated with a pair of left singular vectors factorized from the gene expression matrices of the two biological categories. Importantly, the cutoffs are determined by a data-driven algorithm using the well-defined statistic, SVD-p. The implementation was illustrated on two time series microarray data sets generated from the samples of accessory gland (ACG) and malpighian tubule (MT) tissues of the line W<sup>118 </sup>of <it>M. drosophila</it>. Two conserved modules and six divergent modules, each of which has a unique characteristic profile across tissue kinds and aging processes, were identified. The number of genes contained in these models ranged from five to a few hundred. Three to over a hundred GO terms were over-represented in individual modules with FDR < 0.1. One divergent module suggested the tissue-specific relationship between the expressions of mitochondrion-related genes and the aging process. This finding, together with others, may be of biological significance. The validity of the proposed SVD-based method was further verified by a simulation study, as well as the comparisons with regression analysis and cubic spline regression analysis plus PAM based clustering.</p> <p>Conclusions</p> <p>svdPPCS is a novel computational tool for the comparative analysis of transcriptional profiling. It especially fits the comparison of time series data of related organisms or different tissues of the same organism under equivalent or similar experimental conditions. The general scheme can be directly extended to the comparisons of multiple data sets. It also can be applied to the integration of data sets from different platforms and of different sources.</p

    The Effect of Algorithms on Copy Number Variant Detection

    Get PDF
    BACKGROUND: The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery. METHODOLOGY AND PRINCIPAL FINDINGS: We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212. CONCLUSIONS AND SIGNIFICANCE: Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed

    Reconstructing 800 years of summer temperatures in Scotland from tree rings

    Get PDF
    We thank The Carnegie Trust for the Universities of Scotland for providing funding for Miloš Rydval’s PhD. The Scottish pine network expansion has been an ongoing task since 2007 and funding must be acknowledged to the following projects: EU project ‘Millennium’ (017008-2), Leverhulme Trust project ‘RELiC: Reconstructing 8000 years of Environmental and Landscape change in the Cairngorms (F/00 268/BG)’ and the NERC project ‘SCOT2K: Reconstructing 2000 years of Scottish climate from tree rings (NE/K003097/1)’.This study presents a summer temperature reconstruction using Scots pine tree-ring chronologies for Scotland allowing the placement of current regional temperature changes in a longer-term context. ‘Living-tree’ chronologies were extended using ’subfossil’ samples extracted from nearshore lake sediments resulting in a composite chronology > 800 years in length. The North Cairngorms (NCAIRN) reconstruction was developed from a set of composite blue intensity high-pass and ring-width low-pass chronologies with a range of detrending and disturbance correction procedures. Calibration against July-August mean temperature explains 56.4% of the instrumental data variance over 1866-2009 and is well verified. Spatial correlations reveal strong coherence with temperatures over the British Isles, parts of western Europe, southern Scandinavia and northern parts of the Iberian Peninsula. NCAIRN suggests that the recent summer-time warming in Scotland is likely not unique when compared to multi-decadal warm periods observed in the 1300s, 1500s, and 1730s, although trends before the mid-16th century should be interpreted with some caution due to greater uncertainty. Prominent cold periods were identified from the 16th century until the early 1800s – agreeing with the so-called Little Ice Age observed in other tree-ring reconstructions from Europe - with the 1690s identified as the coldest decade in the record. The reconstruction shows a significant cooling response one year following volcanic eruptions although this result is sensitive to the datasets used to identify such events. In fact, the extreme cold (and warm) years observed in NCAIRN appear more related to internal forcing of the summer North Atlantic Oscillation.Publisher PDFPeer reviewe

    Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms

    Get PDF
    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications

    North Atlantic summer storm tracks over Europe dominated by internal variability over the past millennium

    Get PDF
    Certain large sustained anomalies in European temperatures in the last millennium do not match estimations of external climate forcing, and are likely the result of internal climate variations. Should these anomalies occur again in the future, they could be large enough to significantly modulate the response of European temperatures from the expected response to greenhouse forcing. Here, we use temperature observations, simulations and reconstructions over the past millennium to show that, whilst continental multidecadal mean summer temperature has varied within a span of 1K and is primarily controlled by external forcing, subcontinental deviations from the mean, described by the temperature contrast between northern and south Europe (the meridional temperature gradient, MTG), vary within a span of 2K (simulation estimated) and are primarily controlled by internal climatic processes. These processes comprise internally generated redistributions of precipitation and cloud cover that are linked to vacillations in the position of the summer storm track. In contrast to the 20th century, the summer storm-track has varied stochastically over the past millennium, with a weak response to external forcing. The future response of European summer temperatures to anthropogenic greenhouse forcing is likely to be spatially modulated by stochastic internal processes which have caused cool, damp summers in northern Europe over multiple periods of the last millennium, and over the last two decades

    ATP-dependent chromatin remodeling shapes the DNA replication landscape.

    Get PDF
    The eukaryotic DNA replication machinery must traverse every nucleosome in the genome during S phase. As nucleosomes are generally inhibitory to DNA-dependent processes, chromatin structure must undergo extensive reorganization to facilitate DNA synthesis. However, the identity of chromatin-remodeling factors involved in replication and how they affect DNA synthesis is largely unknown. Here we show that two highly conserved ATP-dependent chromatin-remodeling complexes in Saccharomyces cerevisiae, Isw2 and Ino80, function in parallel to promote replication fork progression. As a result, Isw2 and Ino80 have especially important roles for replication of late-replicating regions during periods of replication stress. Both Isw2 and Ino80 complexes are enriched at sites of replication, suggesting that these complexes act directly to promote fork progression. These findings identify ATP-dependent chromatin-remodeling complexes that promote DNA replication and define a specific stage of replication that requires remodeling for normal function

    Family-Centered Preventive Intervention for Military Families: Implications for Implementation Science

    Get PDF
    In this paper, we report on the development and dissemination of a preventive intervention, Families OverComing Under Stress (FOCUS), an eight-session family-centered intervention for families facing the impact of wartime deployments. Specific attention is given to the challenges of rapidly deploying a prevention program across diverse sites, as well as to key elements of implementation success. FOCUS, developed by a UCLA-Harvard team, was disseminated through a large-scale demonstration project funded by the United States Bureau of Navy Medicine and Surgery (BUMED) beginning in 2008 at 7 installations and expanding to 14 installations by 2010. Data are presented to describe the range of services offered, as well as initial intervention outcomes. It proved possible to develop the intervention rapidly and to deploy it consistently and effectively

    A Snapshot of CNVs in the Pig Genome

    Get PDF
    Recent studies of mammalian genomes have uncovered the extent of copy number variation (CNV) that contributes to phenotypic diversity, including health and disease status. Here we report a first account of CNVs in the pig genome covering part of the chromosomes 4, 7, 14, and 17 already sequenced and assembled. A custom tiling oligonucleotide array was used with a median probe spacing of 409 bp for screening 12 unrelated Duroc boars that are founders of a large family material. After a strict CNV calling pipeline, 37 copy number variable regions (CNVRs) across all four chromosomes were identified, with five CNVRs overlapping segmental duplications, three overlapping pig unigenes and one overlapping a RefSeq pig mRNA. This CNV snapshot analysis is the first of its kind in the porcine genome and constitutes the basis for a better understanding of porcine phenotypes and genotypes with the prospect of identifying important economic traits

    Identification of Copy Number Variants Defining Genomic Differences among Major Human Groups

    Get PDF
    BACKGROUND:Understanding the genetic contribution to phenotype variation of human groups is necessary to elucidate differences in disease predisposition and response to pharmaceutical treatments in different human populations. METHODOLOGY/PRINCIPAL FINDINGS:We have investigated the genome-wide profile of structural variation on pooled samples from the three populations studied in the HapMap project by comparative genome hybridization (CGH) in different array platforms. We have identified and experimentally validated 33 genomic loci that show significant copy number differences from one population to the other. Interestingly, we found an enrichment of genes related to environment adaptation (immune response, lipid metabolism and extracellular space) within these regions and the study of expression data revealed that more than half of the copy number variants (CNVs) translate into gene-expression differences among populations, suggesting that they could have functional consequences. In addition, the identification of single nucleotide polymorphisms (SNPs) that are in linkage disequilibrium with the copy number alleles allowed us to detect evidences of population differentiation and recent selection at the nucleotide variation level. CONCLUSIONS:Overall, our results provide a comprehensive view of relevant copy number changes that might play a role in phenotypic differences among major human populations, and generate a list of interesting candidates for future studies
    • …
    corecore