27 research outputs found
Automated and traceable processing for large-scale high-throughput sequencing facilities
Scaling up production in medium and large high-throughput sequencing facilities presents a number of challenges. As the rate of samples to process increases, manually performing and tracking the centerâs operations becomes increasingly difficult, costly and error prone, while processing the massive amounts of data poses significant computational challenges. We present our ongoing work to automate and track all data-related procedures at the CRS4 Sequencing and Genotyping Platform, while integrating state-of-the-art processing technologies such as Hadoop, OMERO, iRODS, and Galaxy into our automated workflows. Currently, the core system is in its testing phase and it is on schedule to be in production use at CRS4 by May 2013. The results thus far obtained are encouraging and the authors are confident that the CRS4 Platform will increase its efficiency and capacity thanks to this system. In the near future, the integration components will be released as as open source software.23-24Pubblicat
Recommended from our members
Training Infrastructure as a Service
Background Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. Findings Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. Conclusions TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training
Exome sequencing in Crisponi/CISS-like individuals reveals unpredicted alternative diagnoses
Crisponi/coldâinduced sweating syndrome (CS/CISS) is a rare autosomal recessive disorder characterized by a complex phenotype (hyperthermia and feeding difficulties in the neonatal period, followed by scoliosis and paradoxical sweating induced by cold since early childhood) and a high neonatal lethality. CS/CISS is a genetically heterogeneous disorder caused by mutations in CRLF1 (CS/CISS1), CLCF1 (CS/CISS2) and KLHL7 (CS/CISSâlike). Here, a whole exome sequencing approach in individuals with CS/CISSâlike phenotype with unknown molecular defect revealed unpredicted alternative diagnoses. This approach identified putative pathogenic variations in NALCN, MAGEL2 and SCN2A. They were already found implicated in the pathogenesis of other syndromes, respectively the congenital contractures of the limbs and face, hypotonia, and developmental delay syndrome, the SchaafâYang syndrome, and the early infantile epileptic encephalopathyâ11 syndrome. These results suggest a high neonatal phenotypic overlap among these disorders and will be very helpful for clinicians. Genetic analysis of these genes should be considered for those cases with a suspected CS/CISS during neonatal period who were tested as mutation negative in the known CS/CISS genes, because an expedited and corrected diagnosis can improve patient management and can provide a specific clinical followâup
Genome-wide association study of susceptibility loci for breast cancer in Sardinian population
Abstract
Background
Despite progress in identifying genes associated with breast cancer, many more risk loci exist. Genome-wide association analyses in genetically-homogeneous populations, such as that of Sardinia (Italy), could represent an additional approach to detect low penetrance alleles.
Methods
We performed a genome-wide association study comparing 1431 Sardinian patients with non-familial, BRCA1/2-mutation-negative breast cancer to 2171 healthy Sardinian blood donors. DNA was genotyped using GeneChip Human Mapping 500Â K Arrays or Genome-Wide Human SNP Arrays 6.0. To increase genomic coverage, genotypes of additional SNPs were imputed using data from HapMap Phase II. After quality control filtering of genotype data, 1367 cases (9 men) and 1658 controls (1156 men) were analyzed on a total of 2,067,645 SNPs.
Results
Overall, 33 genomic regions (67 candidate SNPs) were associated with breast cancer risk at the pâ<â10â6 level. Twenty of these regions contained defined genes, including one already associated with breast cancer risk: TOX3. With a lower threshold for preliminary significance to pâ<â10â5, we identified 11 additional SNPs in FGFR2, a well-established breast cancer-associated gene. Ten candidate SNPs were selected, excluding those already associated with breast cancer, for technical validation as well as replication in 1668 samples from the same population. Only SNP rs345299, located in intron 1 of VAV3, remained suggestively associated (p-value, 1.16x10â5), but it did not associate with breast cancer risk in pooled data from two large, mixed-population cohorts.
Conclusions
This study indicated the role of TOX3 and FGFR2 as breast cancer susceptibility genes in BRCA1/2-wild-type breast cancer patients from Sardinian population
Genome-wide association study of susceptibility loci for breast cancer in Sardinian population.
BACKGROUND: Despite progress in identifying genes associated with breast cancer, many more risk loci exist. Genome-wide association analyses in genetically-homogeneous populations, such as that of Sardinia (Italy), could represent an additional approach to detect low penetrance alleles. METHODS: We performed a genome-wide association study comparing 1431 Sardinian patients with non-familial, BRCA1/2-mutation-negative breast cancer to 2171 healthy Sardinian blood donors. DNA was genotyped using GeneChip Human Mapping 500 K Arrays or Genome-Wide Human SNP Arrays 6.0. To increase genomic coverage, genotypes of additional SNPs were imputed using data from HapMap Phase II. After quality control filtering of genotype data, 1367 cases (9 men) and 1658 controls (1156 men) were analyzed on a total of 2,067,645 SNPs. RESULTS: Overall, 33 genomic regions (67 candidate SNPs) were associated with breast cancer risk at the p <â 0(-6) level. Twenty of these regions contained defined genes, including one already associated with breast cancer risk: TOX3. With a lower threshold for preliminary significance to p < 10(-5), we identified 11 additional SNPs in FGFR2, a well-established breast cancer-associated gene. Ten candidate SNPs were selected, excluding those already associated with breast cancer, for technical validation as well as replication in 1668 samples from the same population. Only SNP rs345299, located in intron 1 of VAV3, remained suggestively associated (p-value, 1.16 x 10(-5)), but it did not associate with breast cancer risk in pooled data from two large, mixed-population cohorts. CONCLUSIONS: This study indicated the role of TOX3 and FGFR2 as breast cancer susceptibility genes in BRCA1/2-wild-type breast cancer patients from Sardinian population
Tools and data services registry: a community effort to document bioinformatics resources
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.
Here we present a community-driven curation effort, supported by ELIXIRâthe European infrastructure for biological informationâthat aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.
As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools
Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis
A genome wide association scan of ~6.6 million genotyped or imputed variants in 882 Sardinian Multiple Sclerosis (MS) cases and 872 controls suggested association of CBLB gene variants with disease, which was confirmed in 1,775 cases and 2,005 controls (overall P =1.60 Ă 10-10). CBLB encodes a negative regulator of adaptive immune responses and mice lacking the orthologue are prone to experimental autoimmune encephalomyelitis, the animal model of MS
Simulating Cardiac Electrophysiology Using Unstructured All-Hexahedra Spectral Elements
We discuss the application of the spectral element method to the monodomain and bidomain equations describing propagation of cardiac action potential. Models of cardiac electrophysiology consist of a system of partial differential equations coupled with a system of ordinary differential equations representing cell membrane dynamics. The solution of these equations requires solving multiple length scales due to the ratio of advection to diffusion that varies among the different equations. High order approximation of spectral elements provides greater flexibility in resolving multiple length scales. Furthermore, spectral elements are extremely efficient to model propagation phenomena on complex shapes using fewer degrees of freedom than its finite element equivalent (for the same level of accuracy). We illustrate a fully unstructured all-hexahedra approach implementation of the method and we apply it to the solution of full 3D monodomain and bidomain test cases. We discuss some key elements of the proposed approach on some selected benchmarks and on an anatomically based whole heart human computational model
crs4/Galaxy4Developers: July 2017
Training material for the ELIXIR-IIB course on "Galaxy for Bioinformatics tool developers"
https://crs4.github.io/Galaxy4Developers
The PARIGA server for real time filtering and analysis of reciprocal BLAST results
BLAST-based similarity searches are commonly used in several applications involving both nucleotide and protein sequences. These applications span from simple tasks such as mapping sequences over a database to more complex procedures as clustering or annotation processes. When the amount of analysed data increases, manual inspection of BLAST results become a tedious procedure. Tools for parsing or filtering BLAST results for different purposes are then required. We describe here PARIGA (http://resources.bioinformatica.crs4.it/pariga/), a server that enables users to perform all-against-all BLAST searches on two sets of sequences selected by the user. Moreover, since it stores the two BLAST output in a python-serialized-objects database, results can be filtered according to several parameters in real-time fashion, without re-running the process and avoiding additional programming efforts. Results can be interrogated by the user using logical operations, for example to retrieve cases where two queries match same targets, or when sequences from the two datasets are reciprocal best hits, or when a query matches a target in multiple regions. The Pariga web server is designed to be a helpful tool for managing the results of sequence similarity searches. The design and implementation of the server renders all operations very fast and easy to use