37 research outputs found

    A First Generation Microsatellite- and SNP-Based Linkage Map of Jatropha

    Get PDF
    Jatropha curcas is a potential plant species for biodiesel production. However, its seed yield is too low for profitable production of biodiesel. To improve the productivity, genetic improvement through breeding is essential. A linkage map is an important component in molecular breeding. We established a first-generation linkage map using a mapping panel containing two backcross populations with 93 progeny. We mapped 506 markers (216 microsatellites and 290 SNPs from ESTs) onto 11 linkage groups. The total length of the map was 1440.9 cM with an average marker space of 2.8 cM. Blasting of 222 Jatropha ESTs containing polymorphic SSR or SNP markers against EST-databases revealed that 91.0%, 86.5% and 79.2% of Jatropha ESTs were homologous to counterparts in castor bean, poplar and Arabidopsis respectively. Mapping 192 orthologous markers to the assembled whole genome sequence of Arabidopsis thaliana identified 38 syntenic blocks and revealed that small linkage blocks were well conserved, but often shuffled. The first generation linkage map and the data of comparative mapping could lay a solid foundation for QTL mapping of agronomic traits, marker-assisted breeding and cloning genes responsible for phenotypic variation

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight

    No full text
    Current cluster scaled genomics data processing solutions rely on big data frameworks like Apache Spark, Hadoop and HDFS for data scheduling, processing and storage. These frameworks come with additional computation and memory overheads by default. It has been observed that scaling genomics dataset processing beyond 32 nodes is not efficient on such frameworks.To overcome the inefficiencies of big data frameworks for processing genomics data on clusters, we introduce a low-overhead and highly scalable solution on a SLURM based HPC batch system. This solution uses Apache Arrow as in-memory columnar data format to store genomics data efficiently and Arrow Flight as a network protocol to move and schedule this data across the HPC nodes with low communication overhead.As a use case, we use NGS short reads DNA sequencing data for pre-processing and variant calling applications. This solution outperforms existing Apache Spark based big data solutions in term of both computation time (2x) and lower communication overhead (more than 20-60% depending on cluster size). Our solution has similar performance to MPI-based HPC solutions, with the added advantage of easy programmability and transparent big data scalability. The whole solution is Python and shell script based, which makes it flexible to update and integrate alternative variant callers. Our solution is publicly available on GitHub at https://github.com/abs-tudelft/time-to-fly-high/tree/main/genomicsGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Computer Engineerin

    Inter-comparison of MAX-DOAS measurements of tropospheric HONO slant column densities and vertical profiles during the CINDI-2 campaign

    Get PDF
    We present the inter-comparison of delta slant column densities (SCDs) and vertical profiles of nitrous acid (HONO) derived from measurements of different multi-axis differential optical absorption spectroscopy (MAX-DOAS) instruments and using different inversion algorithms during the Second Cabauw Inter-comparison campaign for Nitrogen Dioxide measuring Instruments (CINDI-2) in September 2016 at Cabauw, the Netherlands (51.97∘ N, 4.93∘ E). The HONO vertical profiles, vertical column densities (VCDs), and near-surface volume mixing ratios are compared between different MAX-DOAS instruments and profile inversion algorithms for the first time. Systematic and random discrepancies of the HONO results are derived from the comparisons of all data sets against their median values. Systematic discrepancies of HONO delta SCDs are observed in the range of ±0.3×1015 molec. cm−2, which is half of the typical random discrepancy of 0.6×1015 molec. cm−2. For a typical high HONO delta SCD of 2×1015 molec. cm−2, the relative systematic and random discrepancies are about 15 % and 30 %, respectively. The inter-comparison of HONO profiles shows that both systematic and random discrepancies of HONO VCDs and near-surface volume mixing ratios (VMRs) are mostly in the range of ∼±0.5×1014 molec. cm−2 and ∼±0.1 ppb (typically ∼20 %). Further we find that the discrepancies of the retrieved HONO profiles are dominated by discrepancies of the HONO delta SCDs. The profile retrievals only contribute to the discrepancies of the HONO profiles by ∼5 %. However, some data sets with substantially larger discrepancies than the typical values indicate that inappropriate implementations of profile inversion algorithms and configurations of radiative transfer models in the profile retrievals can also be an important uncertainty source. In addition, estimations of measurement uncertainties of HONO dSCDs, which can significantly impact profile retrievals using the optimal estimation method, need to consider not only DOAS fit errors, but also atmospheric variability, especially for an instrument with a DOAS fit error lower than ∼3×1014 molec. cm−2. The MAX-DOAS results during the CINDI-2 campaign indicate that the peak HONO levels (e.g. near-surface VMRs of ∼0.4 ppb) often appeared in the early morning and below 0.2 km. The near-surface VMRs retrieved from the MAX-DOAS observations are compared with those measured using a co-located long-path DOAS instrument. The systematic differences are smaller than 0.15 and 0.07 ppb during early morning and around noon, respectively. Since true HONO values at high altitudes are not known in the absence of real measurements, in order to evaluate the abilities of profile inversion algorithms to respond to different HONO profile shapes, we performed sensitivity studies using synthetic HONO delta SCDs simulated by a radiative transfer model with assumed HONO profiles. The tests indicate that the profile inversion algorithms based on the optimal estimation method with proper configurations can reproduce the different HONO profile shapes well. Therefore we conclude that the features of HONO accumulated near the surface derived from MAX-DOAS measurements are expected to represent the ambient HONO profiles well
    corecore