21 research outputs found
BioModels: ten-year anniversary
BioModels (http://www.ebi.ac.uk/biomodels/) is a repository of mathematical models of biological processes. A large set of models is curated to verify both correspondence to the biological process that the model seeks to represent, and reproducibility of the simulation results as described in the corresponding peer-reviewed publication. Many models submitted to the database are annotated, cross-referencing its components to external resources such as database records, and terms from controlled vocabularies and ontologies. BioModels comprises two main branches: one is composed of models derived from literature, while the second is generated through automated processes. BioModels currently hosts over 1200 models derived directly from the literature, as well as in excess of 140 000 models automatically generated from pathway resources. This represents an approximate 60-fold growth for literature-based model numbers alone, since BioModels’ first release a decade ago. This article describes updates to the resource over this period, which include changes to the user interface, the annotation profiles of models in the curation pipeline, major infrastructure changes, ability to perform online simulations and the availability of model content in Linked Data form. We also outline planned improvements to cope with a diverse array of new challenges
Toward community standards and software for whole-cell modeling
Whole-cell (WC) modeling is a promising tool for biological research, bioengineering, and medicine. However, substantial work remains to create accurate, comprehensive models of complex cells. Methods: We organized the 2015 Whole-Cell Modeling Summer School to teach WC modeling and evaluate the need for new WC modeling standards and software by recoding a recently published WC model in SBML. Results: Our analysis revealed several challenges to representing WC models using the current standards. Conclusion: We, therefore, propose several new WC modeling standards, software, and databases. Significance:We anticipate that these new standards and software will enable more comprehensive models
Novel transcript discovery in yeast using sequencing reads from Oxford Nanopore Technologies in comparison to Next Generation Sequencing
Curs 2017-2018The advent of Oxford Nanopore Technologies is expected to play a pivotal role in near future
sequencing studies. Omics data analysis and related processes, currently bound to Next
Generation Sequencing, can be now complemented and improved by the rise of Nanopore
devices, tools and pipelines.
We have herein explored the advantages and limitations of the application of Nanopore
technologies for the reconstruction of transcriptomes as well as the discovery of unannotated
genes sequences in yeast. Taking Saccharomyces cerevisiae as our focal species, comprehensive
pipelines have been developed, optimised and compared with other processing approaches, also
generated in-house. The comparisons of newly suggested pipelines with existing Illumina-based
strategies are of especial interest.
As a result of this investigation project, we provide insights about the impact of different
technologies and pipelines for transcriptomics data processing and analysis. At the same time,
reproducible code and technical details are made available along with the report
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Background
Low-pass sequencing followed by sequence variant genotype imputation is an alternative to the routine microarray-based genotyping in cattle. However, the impact of haplotype reference panels and their interplay with the coverage of low-pass whole-genome sequencing data have not been sufficiently explored in typical livestock settings where only a small number of reference samples is available.
Methods
Sequence variant genotyping accuracy was compared between two variant callers, GATK and DeepVariant, in 50 Brown Swiss cattle with sequencing coverages ranging from 4- to 63-fold. Haplotype reference panels of varying sizes and composition were built with DeepVariant based on 501 individuals from nine breeds. High-coverage sequence data for 24 Brown Swiss cattle were downsampled to between 0.01- and 4-fold to mimic low-pass sequencing. GLIMPSE was used to infer sequence variant genotypes from the low-pass sequencing data using different haplotype reference panels. The accuracy of the sequence variant genotypes that were inferred from low-pass sequencing data was compared with sequence variant genotypes called from high-coverage data.
Results
DeepVariant was used to establish bovine haplotype reference panels because it outperformed GATK in all evaluations. Within-breed haplotype reference panels were more accurate and efficient to impute sequence variant genotypes from low-pass sequencing than equally-sized multibreed haplotype reference panels for all target sample coverages and allele frequencies. F1 scores greater than 0.9, which indicate high harmonic means of recall and precision of called genotypes, were achieved with 0.25-fold sequencing coverage when large breed-specific haplotype reference panels (n = 150) were used. In absence of such large within-breed haplotype panels, variant genotyping accuracy from low-pass sequencing could be increased either by adding non-related samples to the haplotype reference panel or by increasing the coverage of the low-pass sequencing data. Sequence variant genotyping from low-pass sequencing was substantially less accurate when the reference panel lacked individuals from the target breed.
Conclusions
Variant genotyping is more accurate with DeepVariant than GATK. DeepVariant is therefore suitable to establish bovine haplotype reference panels. Medium-sized breed-specific haplotype reference panels and large multibreed haplotype reference panels enable accurate imputation of low-pass sequencing data in a typical cattle breed.ISSN:0999-193XISSN:1297-968
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Abstract Background Low-pass sequencing followed by sequence variant genotype imputation is an alternative to the routine microarray-based genotyping in cattle. However, the impact of haplotype reference panels and their interplay with the coverage of low-pass whole-genome sequencing data have not been sufficiently explored in typical livestock settings where only a small number of reference samples is available. Methods Sequence variant genotyping accuracy was compared between two variant callers, GATK and DeepVariant, in 50 Brown Swiss cattle with sequencing coverages ranging from 4- to 63-fold. Haplotype reference panels of varying sizes and composition were built with DeepVariant based on 501 individuals from nine breeds. High-coverage sequence data for 24 Brown Swiss cattle were downsampled to between 0.01- and 4-fold to mimic low-pass sequencing. GLIMPSE was used to infer sequence variant genotypes from the low-pass sequencing data using different haplotype reference panels. The accuracy of the sequence variant genotypes that were inferred from low-pass sequencing data was compared with sequence variant genotypes called from high-coverage data. Results DeepVariant was used to establish bovine haplotype reference panels because it outperformed GATK in all evaluations. Within-breed haplotype reference panels were more accurate and efficient to impute sequence variant genotypes from low-pass sequencing than equally-sized multibreed haplotype reference panels for all target sample coverages and allele frequencies. F1 scores greater than 0.9, which indicate high harmonic means of recall and precision of called genotypes, were achieved with 0.25-fold sequencing coverage when large breed-specific haplotype reference panels (n = 150) were used. In absence of such large within-breed haplotype panels, variant genotyping accuracy from low-pass sequencing could be increased either by adding non-related samples to the haplotype reference panel or by increasing the coverage of the low-pass sequencing data. Sequence variant genotyping from low-pass sequencing was substantially less accurate when the reference panel lacked individuals from the target breed. Conclusions Variant genotyping is more accurate with DeepVariant than GATK. DeepVariant is therefore suitable to establish bovine haplotype reference panels. Medium-sized breed-specific haplotype reference panels and large multibreed haplotype reference panels enable accurate imputation of low-pass sequencing data in a typical cattle breed
D9.3 Report on implementation of value-added user applications and cohort integration
A range of RESTful APIs have been developed by the EGA in order to allow powerful users, consortia and research institutions to programmatically interconnect with our system. Whilst the majority of the users perform discrete and limited queries/submissions to the system, the majority of the queries/submissions are performed by a limited and advanced set of users. Taking these handful of teams as reference, we are herein describing the different approaches EGA APIs can be used for a better, faster and complete experience.
As described in the previous deliverable (9.2), the following programmatic endpoints are available and documented for all EGA users: submission API, public metadata API and private metadata API. The main difference between the later two is the requirement of authentication for complete discovery of user-related metadata.
EGA Submission API can be either directly used (https://ega-archive.org/submission/programmatic_submissions/submitting-metadata ) or by using a tool or interface mounted in the top of it. EGA developed its own interface (Submitter Portal - https://ega-archive.org/submission/tools/submitter-portal ) but it is great to see how other important partners are directly leveraging the API for their own solutions. We are focussing our attention on the ICGCsub, software provided by the ICGC consortia for all their projects worldwide for an harmonised and smooth submission to the EGA: https://github.com/icgc-dcc/egasub Metadata can also be retrieved programmatically and in a custom manner from the EGA: https://ega-archive.org/metadata/how-to-use-the-api . This flexible possibility of obtaining and filtering the results allows (1) the generation of reports by the user, who do not need to keep internal track of their submissions (and hence avoid duplications/mismatches), and (2) discovering crucial data information, a good filter before applying and downloading controlled access data. The first functionality can be either extended to the private metadata objects (i.e. in draft status) upon authentication (https://ega-archive.org/submission/programmatic_submissions/how-to-use-the-api ). Worldwide known research institutions with a consistent system of data/metadata submissions (Broad Institute, DKFZ) are consistently considering such feature as a main information gatherer for their submissions
eQTL mapping in Brown Swiss bulls to identify variants associated with male fertility
Fertility is an essential component of the livestock industry. In cattle, numerous QTL for male reproductive success fall within regulatory regions. However, the effects of these loci have not been investigated in detail or on a large scale. Here, we assemble a sizeable cohort of mature bulls to detect expression quantitative trait loci (eQTL) and assess their effects on fertility-related genes. To do this, we sequenced genomes and total RNA from the testes of 72 bulls. We recovered 13,185,795 DNA sequence variants with minor allele frequency >5%, an average of 283,587,831 clean RNA reads per sample and 18,528 testis-expressed genes (TPM>0.2 in 75% of samples). In total, 2,178 genes had significant cis-eQTL at a false discovery rate of 5% (11.76% of expressed genes). Several genes associated with fertility, including SPATA4 and SPATA22 (which are responsible for reproductive processes), had significant cis-eQTL and variation in transcript abundance across genotypes