16 research outputs found

    The Annotometer; Encouraging uptake and use of freely available curation tools.

    No full text
    <p>In recent years it has become clear that the amount of data being generated worldwide cannot be curated and annotated by any individual or small group. Currently, there is recognition that one of the best ways to provide ongoing curation is to employ the power of the community. To achieve this, the first hurdle to overcome was the development of user-friendly tools and apps that non-expert curators would be comfortable and capable of using. Such tools are now in place, including IclikVal ( <a href="http://iclikval.riken.jp/" target="_blank">http://iclikval.riken.jp</a> ) and Hypothes.is ( <a href="https://hypothes.is/" target="_blank">https://hypothes.is</a> ).<br> The second problem, which we are now facing, is bringing together  and engaging the large number of people needed to perform the curation required to empower these tools. To do this, we need to tip the balance from "wouldn't it be great if there was an app to help me find... " to "the information I need is easy to find in app X ". To achieve this, these apps first need to be seeded with useful information that will allow users to realize their utility and begin to both habitually use and add information to these apps. This will make these tools ever more useful and become a standard part of the process of carrying out research.<br> Here we have prepared and present a competition to encourage the uptake of two of the most mature general curation tools currently available. The competition will take place during the 3 days of the Biocuration2016 conference and includes a prize to be rewarded to the person who adds the most annotations. The number of annotations will be tracked by a purpose built tool, the Annot ometer, the code of which will be made available after the conference for re-use by anyone wishing to run a similar event.<br> Users will be asked to register their usernames for the two tools on the Annot ometer website, those usernames will then be used to poll the API's for IclikVal and Hypothes.is usage stats. Everyone will be able to view the leader-board (updated every few minutes) of annotators at anytime during the conference.</p

    Enhancements to the GigaScience Integrated Data & Research Object Publishing Pipeline

    No full text
    In the era of computation and data driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. As datasets get larger and more challenging to disseminate, one approach is to focus more on the compute and interactive research objects such as containers and virtual machines. Publishing more technically challenging and dynamic parts of the research cycle will require more transparent and interactive approaches to review, annotate, and credit the hard work of those assessing them, particularly to avoid the growing challenges of the replication crisis. <i>GigaScience</i> is an open-access, open-data journal tailored for the era of large-scale biological data. Using the data handling infrastructure of the genomics center BGI, <i>GigaScience</i> links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data and provides additional analysis tools and computing resources. In addition, the supporting workflows and methods are also integrated. GigaDB has released many new and previously unpublished datasets and data types, and the latest version has a number of new and improved features. Along with a raft of major under-the-bonnet changes to the data structure, submission wizard, a new search function and results display, and the integration of the Hypothes.is web-annotation tools have all been implemented. Web forms integrate the manuscript and data peer review process with Publons, linking and crediting the peer reviews with DataCite DOIs. Protocols.io is also being merged with the data curation process to streamline the process for authors to enter their methodologies in the collaborative protocol-centered repository. Other “executable” research objects such as workflows, virtual machines, docker containers and software snapshots from several <i>GigaScience</i> articles have been archived and shared in the most open, reproducible, transparent and usable formats possible

    GigaScience; the journal and database, for open access publishing and data dissemination.

    No full text
    <p>GigaScience (http://www.gigasciencejournal.com) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB (http://www.gigadb.org). GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data.</p> <p>GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study. Through our association with DataCite, each dataset in GigaDB is assigned a DOI that can be used as a standard citation for future use of these data in other articles by the authors and other researchers. To enable this all datasets have a title, an author list, and an abstract that provides information specific to the data. To maximize its utility to the research community, all data in GigaDB are placed under a CC0 waiver.</p> <p>We currently host two very popular cancer datasets, “Hepatocellular carcinoma genomic data from the Asia Cancer Research Group” (http://doi.org/10.5524/100034), and “Single cell whole-exome sequences of bladder cancer from an individual.” (http://dx.doi.org/10.5524/100037). The later contains the both the assembled transcriptomes and all the SNP call data.</p

    GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis

    No full text
    <p>Today's next generation sequencing (NGS) experiments generate substantially more data and are more broadly applicable than previous high-throughput genomic assays. Despite the plummeting costs of sequencing, downstream data processing and analysis create financial and bioinformatics challenges for many biomedical scientists. It is therefore important to make NGS data interpretation as accessible as data generation. GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk) represents a NGS data interpretation solution towards the big sequencing data challenge. We have ported the popular Short Oligonucleotide Analysis Package (http://soap.genomics.org.cn) into the Galaxy framework, to provide seamless NGS mapping, de novo assembly, NGS data format conversion and sequence alignment visualization. Our vision is to create an open publication, review and analysis environment by integrating GigaGalaxy into the publication platform at GigaScience and its GigaDB database that links to more than 25 TBs of genomic data. We have begun this effort by re-implementing the data procedures described by Luo et al., (GigaScience 1: 18, 2012) as Galaxy workflows so that they can be shared in a manner which can be visualized and executed in GigaGalaxy. We have also described the experiment using the ISA framework to provide a richer and more interoperable description of the experimental workflows. We hope to revolutionize the publication model with the aim of executable publications, where data analyses can be reproduced and reused.</p

    Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data"</p><p>http://www.biomedcentral.com/1471-2105/9/334</p><p>BMC Bioinformatics 2008;9():334-334.</p><p>Published online 7 Aug 2008</p><p>PMCID:PMC2528018.</p><p></p

    A schematic diagram showing the relationship between Taverna, the RShell processor, RServe and the R tool

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data"</p><p>http://www.biomedcentral.com/1471-2105/9/334</p><p>BMC Bioinformatics 2008;9():334-334.</p><p>Published online 7 Aug 2008</p><p>PMCID:PMC2528018.</p><p></p

    Multi-Platform Analysis of MicroRNA Expression Measurements in RNA from Fresh Frozen and FFPE Tissues

    Get PDF
    <div><p>MicroRNAs play a role in regulating diverse biological processes and have considerable utility as molecular markers for diagnosis and monitoring of human disease. Several technologies are available commercially for measuring microRNA expression. However, cross-platform comparisons do not necessarily correlate well, making it difficult to determine which platform most closely represents the true microRNA expression level in a tissue. To address this issue, we have analyzed RNA derived from cell lines, as well as fresh frozen and formalin-fixed paraffin embedded tissues, using Affymetrix, Agilent, and Illumina microRNA arrays, NanoString counting, and Illumina Next Generation Sequencing. We compared the performance within- and between the different platforms, and then verified these results with those of quantitative PCR data. Our results demonstrate that the within-platform reproducibility for each method is consistently high and although the gene expression profiles from each platform show unique traits, comparison of genes that were commonly detectable showed that detection of microRNA transcripts was similar across multiple platforms.</p> </div
    corecore