21 research outputs found
High Differentiation among Eight Villages in a Secluded Area of Sardinia Revealed by Genome-Wide High Density SNPs Analysis
To better design association studies for complex traits in isolated populations it's important to understand how history and isolation moulded the genetic features of different communities. Population isolates should not “a priori” be considered homogeneous, even if the communities are not distant and part of a small region. We studied a particular area of Sardinia called Ogliastra, characterized by the presence of several distinct villages that display different history, immigration events and population size. Cultural and geographic isolation characterized the history of these communities. We determined LD parameters in 8 villages and defined population structure through high density SNPs (about 360 K) on 360 unrelated people (45 selected samples from each village). These isolates showed differences in LD values and LD map length. Five of these villages show high LD values probably due to their reduced population size and extreme isolation. High genetic differentiation among villages was detected. Moreover population structure analysis revealed a high correlation between genetic and geographic distances. Our study indicates that history, geography and biodemography have influenced the genetic features of Ogliastra communities producing differences in LD and population structure. All these data demonstrate that we can consider each village an isolate with specific characteristics. We suggest that, in order to optimize the study design of complex traits, a thorough characterization of genetic features is useful to identify the presence of sub-populations and stratification within genetic isolates
crs4/Galaxy4Developers: July 2017
Training material for the ELIXIR-IIB course on "Galaxy for Bioinformatics tool developers"
https://crs4.github.io/Galaxy4Developers
Scaling with the flow: advantages of a MapReduce-based scalable and high-throughput sequencing workflow
The continuous increase in sequencing throughput imposes a new generation of tools for data processing. The alternative is to continue suffering scalability problems in processing workflows and IT infrastructure. We evaluate the advantages that the CRS4 Sequencing and Genotyping Platform (CSGP), equipped with 6 Illumina sequencers, gained by replacing its conventional workflow with a new one based on Seal (http://biodoop-seal.sf.net) and Hadoop. The former was a standard pipeline that demultiplexed samples, aligned reads with BWA, removed duplicates with Picard and recalibrated base qualities with GATK. It parallelized computation through concurrent jobs, using a centralized file system to share data. This implementation showed weaknesses as the workload increased: low parallelism; I/O bottleneck at central storage; failure of entire analyses due to node failures or transient cluster problems. The new workflow is a custom, distributed pipeline based on the open-source Seal suite, which provides a set of tools (including a distributed BWA aligner) that run on the Hadoop MapReduce framework, leveraging its functionality for genomic sequencing applications. By switching to a Seal-based workflow we have acquired computational scalability out-of-the-box. Therefore, we can now easily meet the demands imposed by the growing sequencing platform by adding more computing nodes. In addition, the much-increased parallelism has improved overall computational throughput by taking advantage of all available computing power. Notably, we drastically sped up alignment and duplicates removal by 5x without adding computation nodes; adding nodes would result in additional throughput. Moreover, the effort required by our operators to run the analyses has been reduced, since Hadoop transparently handles most hardware and transient network problems and provides a friendly web interface to monitor job progress and logs. Finally, we eliminated the need for our expensive shared parallel storage devices. Our tests reveal that Seal is efficient, achieving close to 70% of the theoretical maximum throughput per node (measured with a single-node version of the workflow on a small data set) and scales linearly at least up to 128 nodes. In summary, this case study suggests that the MapReduce programming model, Seal and Hadoop provide considerable benefits in the genomic sequencing domain. Seal now includes our new workflow as a downloadable sample application.2011-10-11Montreal - CanadaThe 12TH International Congress Of Human Genetics & The American Society Of Human Genetics, 61ST Annual Meeting, October 11–15, 2011 Montreal Canad
Confirmation of a new phenotype in an individual with a variant in the last part of exon 30 of CREBBP
We report here a novel de novo missense variant affecting the last amino acid of exon 30 of CREBBP [NM_004380, c.5170G>A; p.(Glu1724Lys)] in a 17-year-old boy presenting mild intellectual disability and dysmorphisms but not resembling the phenotype of classical Rubinstein–Taybi syndrome. The patient showed a marked overweight from early infancy on and had cortical heterotopias. Recently, 22 individuals have been reported with missense mutations in the last part of exon 30 and the beginning of exon 31 of CREBBP, showing this new phenotype. This additional case further delineates the genotype–phenotype correlations within the molecular and phenotypic spectrum of variants in CREBBP and EP300
Novel ANKRD11 gene mutation in an individual with a mild phenotype of KBG syndrome associated to a GEFS+ phenotypic spectrum: a case report
Abstract Background KBG syndrome is a very rare autosomal dominant disorder, characterized by macrodontia, distinctive craniofacial findings, skeletal findings, post-natal short stature, and developmental delays, sometimes associated with seizures and EEG abnormalities. So far, there have been over 100 cases of KBG syndrome reported. Case presentation Here, we describe two sisters of a non-consanguineous family, both presenting generalized epilepsy with febrile seizures (GEFS+), and one with a more complex phenotype associated with mild intellectual disability, skeletal and dental anomalies. Whole exome sequencing (WES) analysis in all the family members revealed a heterozygous SCN9A mutation, p.(Lys655Arg), shared among the father and the two probands, and a novel de novo loss of function mutation in the ANKRD11 gene, p.(Tyr1715*), in the proband with the more complex phenotype. The reassessment of the phenotypic features confirmed that the patient fulfilled the proposed diagnostic criteria for KBG syndrome, although complicated by early-onset isolated febrile seizures. EEG abnormalities with or without seizures have been reported previously in some KBG cases. The shared variant, occurring in SCN9A, has been previously found in several individuals with GEFS+ and Dravet syndrome. Conclusions This report describe a novel de novo variant in ANKRD11 causing a mild phenotype of KGB syndrome and further supports the association of monogenic pattern of SCN9A mutations with GEFS+. Our data expand the allelic spectrum of ANKRD11 mutations, providing the first Brazilian case of KBG syndrome. Furthermore, this study offers an example of how WES has been instrumental allowing us to better dissect the clinical phenotype under study, which is a multilocus variation aggregating in one proband, rather than a phenotypic expansion associated with a single genomic locus, underscoring the role of multiple rare variants at different loci in the etiology of clinical phenotypes making problematic the diagnostic path. The successful identification of the causal variant in a gene may not be sufficient, making it necessary to identify other variants that fully explain the clinical picture. The prevalence of blended phenotypes from multiple monogenic disorders is currently unknown and will require a systematic re-analysis of large WES datasets for proper diagnosis in daily practice
Factor Correspondence Analysis comparing different individuals from different Ogliastra villages, performed with 5,192 SNPs.
<p>Each individual is represented by a circular shape, and the different 8 communities are marked with different colours. The two plots represented factor 1 and 2 and factor 1 and 3. (A1, A2) Analysis performed with 8 communities; (B1, B2) Analysis performed with 6 communities, without Talana and Urzulei.</p
Endogamy (values in %) in the Ogliastra villages from 1676 to 1975.
<p>Endogamy (values in %) in the Ogliastra villages from 1676 to 1975.</p
Fst values computer for each pair-wise comparison calculated on 5262 SNPs evenly spaced on 500 Kb. Corresponding 95% confidence intervals, shown between parentheses, were determined with permutations testing (set at 1000).
<p>The levels of statistical significance were tested by performing 1000 permutations. All comparisons were highly significant (P<10<sup>−3</sup>).</p
Distribution of Linkage Disequilibrium on chromosome 22.
<p>Average D' (A) and r<sup>2</sup> (B) coefficients plotted in 0.5 Megabases sliding windows (0.25 Mb overlap).</p