88 research outputs found

    Yabi: An online research environment for grid, high performance and cloud computing

    Get PDF
    Background There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure. Results In this paper, we describe the architecture of a software system that is adaptable to a range of both pluggable execution and data backends in an open source implementation called Yabi. Enabling seamless and transparent access to heterogenous HPC environments at its core, Yabi then provides an analysis workflow environment that can create and reuse workflows as well as manage large amounts of both raw and processed data in a secure and flexible way across geographically distributed compute resources. Yabi can be used via a web-based environment to drag-and-drop tools to create sophisticated workflows. Yabi can also be accessed through the Yabi command line which is designed for users that are more comfortable with writing scripts or for enabling external workflow environments to leverage the features in Yabi. Configuring tools can be a significant overhead in workflow environments. Yabi greatly simplifies this task by enabling system administrators to configure as well as manage running tools via a web-based environment and without the need to write or edit software programs or scripts. In this paper, we highlight Yabi's capabilities through a range of bioinformatics use cases that arise from large-scale biomedical data analysis. Conclusion The Yabi system encapsulates considered design of both execution and data models, while abstracting technical details away from users who are not skilled in HPC and providing an intuitive drag-and-drop scalable web-based workflow environment where the same tools can also be accessed via a command line. Yabi is currently in use and deployed at multiple institutions and is available at http://ccg.murdoch.edu.au/yabi

    Design of a framework for the deployment of collaborative independent rare disease-centric registries: Gaucher disease registry model

    Get PDF
    Orphan drug clinical trials often are adversely affected by a lack of high quality treatment efficacy data that can be reliably compared across large patient cohorts derived from multiple governmental and country jurisdictions. It is critical that these patient data be captured with limited corporate involvement. For some time, there have been calls to develop collaborative, non-proprietary, patient-centric registries for post-market surveillance of aspects related to orphan drug efficacy. There is an urgent need for the development and sustainable deployment of these ‘independent’ registries that can capture comprehensive clinical, genetic and therapeutic information on patients with rare diseases. We therefore extended an open-source registry platform, the Rare Disease Registry Framework (RDRF) to establish an Independent Rare Disease Registry (IRDR). We engaged with an established rare disease community for Gaucher disease to determine system requirements, methods of data capture, consent, and reporting. A non-proprietary IRDR model is presented that can serve as autonomous data repository, but more importantly ensures that the relevant data can be made available to appropriate stakeholders in a secure, timely and efficient manner to improve clinical decision-making and the lives of those with a rare diseas

    Design of a framework for the deployment of collaborative independent rare disease-centric registries: Gaucher disease registry model

    Get PDF
    Orphan drug clinical trials often are adversely affected by a lack of high quality treatment efficacy data that can be reliably compared across large patient cohorts derived from multiple governmental and country jurisdictions. It is critical that these patient data be captured with limited corporate involvement. For some time, there have been calls to develop collaborative, non-proprietary, patient-centric registries for post-market surveillance of aspects related to orphan drug efficacy. There is an urgent need for the development and sustainable deployment of these ‘independent’ registries that can capture comprehensive clinical, genetic and therapeutic information on patients with rare diseases. We therefore extended an open-source registry platform, the Rare Disease Registry Framework (RDRF) to establish an Independent Rare Disease Registry (IRDR). We engaged with an established rare disease community for Gaucher disease to determine system requirements, methods of data capture, consent, and reporting. A non-proprietary IRDR model is presented that can serve as autonomous data repository, but more importantly ensures that the relevant data can be made available to appropriate stakeholders in a secure, timely and efficient manner to improve clinical decision-making and the lives of those with a rare disease

    Evidence of a tick RNAi pathway by comparative genomics and reverse genetics screen of targets with known loss-of-function phenotypes in Drosophila

    Get PDF
    Background The Arthropods are a diverse group of organisms including Chelicerata (ticks, mites, spiders), Crustacea (crabs, shrimps), and Insecta (flies, mosquitoes, beetles, silkworm). The cattle tick, Rhipicephalus (Boophilus) microplus, is an economically significant ectoparasite of cattle affecting cattle industries world wide. With the availability of sequence reads from the first Chelicerate genome project (the Ixodes scapularis tick) and extensive R. microplus ESTs, we investigated evidence for putative RNAi proteins and studied RNA interference in tick cell cultures and adult female ticks targeting Drosophila homologues with known cell viability phenotype. Results We screened 13,643 R. microplus ESTs and I. scapularis genome reads to identify RNAi related proteins in ticks. Our analysis identified 31 RNAi proteins including a putative tick Dicer, RISC associated (Ago-2 and FMRp), RNA dependent RNA polymerase (EGO-1) and 23 homologues implicated in dsRNA uptake and processing. We selected 10 R. microplus ESTs with >80% similarity to D. melanogaster proteins associated with cell viability for RNAi functional screens in both BME26 R. microplus embryonic cells and female ticks in vivo. Only genes associated with proteasomes had an effect on cell viability in vitro. In vivo RNAi showed that 9 genes had significant effects either causing lethality or impairing egg laying. Conclusion We have identified key RNAi-related proteins in ticks and along with our loss-of-function studies support a functional RNAi pathway in R. microplus. Our preliminary studies indicate that tick RNAi pathways may differ from that of other Arthropods such as insects

    The complexity of Rhipicephalus (Boophilus) microplus genome characterised through detailed analysis of two BAC clones

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Rhipicephalus (Boophilus) microplus (Rmi) </it>a major cattle ectoparasite and tick borne disease vector, impacts on animal welfare and industry productivity. In arthropod research there is an absence of a complete Chelicerate genome, which includes ticks, mites, spiders, scorpions and crustaceans. Model arthropod genomes such as <it>Drosophila </it>and <it>Anopheles </it>are too taxonomically distant for a reference in tick genomic sequence analysis. This study focuses on the <it>de-novo </it>assembly of two <it>R. microplus </it>BAC sequences from the understudied <it>R microplus </it>genome. Based on available <it>R. microplus </it>sequenced resources and comparative analysis, tick genomic structure and functional predictions identify complex gene structures and genomic targets expressed during tick-cattle interaction.</p> <p>Results</p> <p>In our BAC analyses we have assembled, using the correct positioning of BAC end sequences and transcript sequences, two challenging genomic regions. Cot DNA fractions compared to the BAC sequences confirmed a highly repetitive BAC sequence BM-012-E08 and a low repetitive BAC sequence BM-005-G14 which was gene rich and contained short interspersed elements (SINEs). Based directly on the BAC and Cot data comparisons, the genome wide frequency of the SINE Ruka element was estimated. Using a conservative approach to the assembly of the highly repetitive BM-012-E08, the sequence was de-convoluted into three repeat units, each unit containing an 18S, 5.8S and 28S ribosomal RNA (rRNA) encoding gene sequence (rDNA), related internal transcribed spacer and complex intergenic region.</p> <p>In the low repetitive BM-005-G14, a novel gene complex was found between to 2 genes on the same strand. Nested in the second intron of a large 9 Kb <it>papilin </it>gene was a <it>helicase </it>gene. This <it>helicase </it>overlapped in two exonic regions with the <it>papilin</it>. Both these genes were shown expressed in different tick life stage important in ectoparasite interaction with the host. Tick specific sequence differences were also determined for the <it>papilin </it>gene and the protein binding sites of the 18S subunit in a comparison to <it>Bos taurus</it>.</p> <p>Conclusion</p> <p>In the absence of a sequenced reference genome we have assembled two complex BAC sequences, characterised novel gene structure that was confirmed by gene expression and sequencing analyses. This is the first report to provide evidence for 2 eukaryotic genes with exon regions that overlap on the same strand, the first to describe <it>Rhipicephalinae papilin</it>, and the first to report the complete ribosomal DNA repeated unit sequence structure for ticks. The Cot data estimation of genome wide sequence frequency means this research will underpin future efforts for genome sequencing and assembly of the <it>R. microplus </it>genome.</p

    Genome sequencing and analysis of the paclitaxelproducing endophytic fungus \u3cem\u3ePenicillium aurantiogriseum\u3c/em\u3e NRRL 62431

    Get PDF
    Background Paclitaxel (Taxol™) is an important anticancer drug with a unique mode of action. The biosynthesis of paclitaxel had been considered restricted to the Taxus species until it was discovered in Taxomyces andreanae, an endophytic fungus of T. brevifolia. Subsequently, paclitaxel was found in hazel (Corylus avellana L.) and in several other endophytic fungi. The distribution of paclitaxel in plants and endophytic fungi and the reported sequence homology of key genes in paclitaxel biosynthesis between plant and fungi species raises the question about whether the origin of this pathway in these two physically associated groups could have been facilitated by horizontal gene transfer. Results The ability of the endophytic fungus of hazel Penicillium aurantiogriseum NRRL 62431 to independently synthesize paclitaxel was established by liquid chromatography-mass spectrometry and proton nuclear magnetic resonance. The genome of Penicillium aurantiogriseum NRRL 62431 was sequenced and gene candidates that may be involved in paclitaxel biosynthesis were identified by comparison with the 13 known paclitaxel biosynthetic genes in Taxus. We found that paclitaxel biosynthetic gene candidates in P. aurantiogriseum NRRL 62431 have evolved independently and that horizontal gene transfer between this endophytic fungus and its plant host is unlikely. Conclusions Our findings shed new light on how paclitaxel-producing endophytic fungi synthesize paclitaxel, and will facilitate metabolic engineering for the industrial production of paclitaxel from fungi

    The TREAT-NMD DMD Global Database: analysis of more than 7,000 Duchenne muscular dystrophy mutations.

    Get PDF
    Analyzing the type and frequency of patient-specific mutations that give rise to Duchenne muscular dystrophy (DMD) is an invaluable tool for diagnostics, basic scientific research, trial planning, and improved clinical care. Locus-specific databases allow for the collection, organization, storage, and analysis of genetic variants of disease. Here, we describe the development and analysis of the TREAT-NMD DMD Global database (http://umd.be/TREAT_DMD/). We analyzed genetic data for 7,149 DMD mutations held within the database. A total of 5,682 large mutations were observed (80% of total mutations), of which 4,894 (86%) were deletions (1 exon or larger) and 784 (14%) were duplications (1 exon or larger). There were 1,445 small mutations (smaller than 1 exon, 20% of all mutations), of which 358 (25%) were small deletions and 132 (9%) small insertions and 199 (14%) affected the splice sites. Point mutations totalled 756 (52% of small mutations) with 726 (50%) nonsense mutations and 30 (2%) missense mutations. Finally, 22 (0.3%) mid-intronic mutations were observed. In addition, mutations were identified within the database that would potentially benefit from novel genetic therapies for DMD including stop codon read-through therapies (10% of total mutations) and exon skipping therapy (80% of deletions and 55% of total mutations)

    De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes

    Get PDF
    Background Euphorbia fischeriana is an important medicinal plant found in Northeast China. The plant roots contain many medicinal compounds including 12-deoxyphorbol-13-acetate, commonly known as prostratin that is a phorbol ester from the tigliane diterpene series. Prostratin is a protein kinase C activator and is effective in the treatment of Human Immunodeficiency Virus (HIV) by acting as a latent HIV activator. Latent HIV is currently the biggest limitation for viral eradication. The aim of this study was to sequence, assemble and annotate the E. fischeriana transcriptome to better understand the potential biochemical pathways leading to the synthesis of prostratin and other related diterpene compounds. Results In this study we conducted a high throughput RNA-seq approach to sequence the root transcriptome of E. fischeriana. We assembled 18,180 transcripts, of these the majority encoded protein-coding genes and only 17 transcripts corresponded to known RNA genes. Interestingly, we identified 5,956 protein-coding transcripts with high similarity (>=75%) to Ricinus communis, a close relative to E. fischeriana. We also evaluated the conservation of E. fischeriana genes against EST datasets from the Euphorbeacea family, which included R. communis, Hevea brasiliensis and Euphorbia esula. We identified a core set of 1,145 gene clusters conserved in all four species and 1,487 E. fischeriana paralogous genes. Furthermore, we screened E. fischeriana transcripts against an in-house reference database for genes implicated in the biosynthesis of upstream precursors to prostratin. This identified 24 and 9 candidate transcripts involved in the terpenoid and diterpenoid biosyntehsis pathways, respectively. The majority of the candidate genes in these pathways presented relatively low expression levels except for 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) and isopentenyl diphosphate/dimethylallyl diphosphate synthase (IDS), which are required for multiple downstream pathways including synthesis of casbene, a proposed precursor to prostratin. Conclusion The resources generated in this study provide new insights into the upstream pathways to the synthesis of prostratin and will likely facilitate functional studies aiming to produce larger quantities of this compound for HIV research and/or treatment of patients

    Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.

    Get PDF
    Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. `Morex' was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX).Peer reviewe

    The Complete Genome Sequence of the Pathogenic Intestinal Spirochete Brachyspira pilosicoli and Comparison with Other Brachyspira Genomes

    Get PDF
    Background: The anaerobic spirochete Brachyspira pilosicoli colonizes the large intestine of various species of birds and mammals, including humans. It causes ''intestinal spirochetosis'', a condition characterized by mild colitis, diarrhea and reduced growth. This study aimed to sequence and analyse the bacterial genome to investigate the genetic basis of its specialized ecology and virulence. Methodology/Principal Findings: The genome of B. pilosicoli 95/1000 was sequenced, assembled and compared with that of the pathogenic Brachyspira hyodysenteriae and a near-complete sequence of Brachyspira murdochii. The B. pilosicoli genome was circular, composed of 2,586,443 bp with a 27.9 mol% G+C content, and encoded 2,338 genes. The three Brachyspira species shared 1,087 genes and showed evidence of extensive genome rearrangements. Despite minor differences in predicted protein functional groups, the species had many similar features including core metabolic pathways. Genes distinguishing B. pilosicoli from B. hyodysenteriae included those for a previously undescribed bacteriophage that may be useful for genetic manipulation, for a glycine reductase complex allowing use of glycine whilst protecting from oxidative stress, and for aconitase and related enzymes in the incomplete TCA cycle, allowing glutamate synthesis and function of the cycle during oxidative stress. B. pilosicoli had substantially fewer methyl-accepting chemotaxis genes than B. hyodysenteriae and hence these species are likely to have different chemotactic responses that may help to explain their different host range and colonization sites. B. pilosicoli lacked the gene for a new putative hemolysin identified in B. hyodysenteriae WA1. Both B. pilosicoli and B. murdochii lacked the rfbBADC gene cluster found on the B. hyodysenteriae plasmid, and hence were predicted to have different lipooligosaccharide structures. Overall, B. pilosicoli 95/1000 had a variety of genes potentially contributing to virulence. Conclusions/Significance: The availability of the complete genome sequence of B. pilosicoli 95/1000 will facilitate functional genomics studies aimed at elucidating host-pathogen interactions and virulence
    corecore