16 research outputs found
The Annotometer; Encouraging uptake and use of freely available curation tools.
<p>In recent years it has
become clear that the amount of data being generated worldwide cannot be
curated and annotated by any individual or small group. Currently, there is
recognition that one of the best ways to provide ongoing curation is to employ
the power of the community. To achieve this, the first hurdle to overcome was
the development of user-friendly tools and apps that non-expert curators would
be comfortable and capable of using. Such tools are now in place, including
IclikVal (Â <a href="http://iclikval.riken.jp/" target="_blank">http://iclikval.riken.jp</a>Â ) and Hypothes.is (Â <a href="https://hypothes.is/" target="_blank">https://hypothes.is</a>Â ).<br>
The
second problem, which we are now facing, is bringing together  and engaging the large number of people needed
to perform the curation required to empower these tools. To do this, we need to
tip the balance from "wouldn't it be great if there was an app to help me
find... " to "the information I need is easy to find in app X ".
To achieve this, these apps first need to be seeded with useful information
that will allow users to realize their utility and begin to both habitually use
and add information to these apps. This will make these tools ever more useful
and become a standard part of the process of carrying out research.<br>
Here we
have prepared and present a competition to encourage the uptake of two of the
most mature general curation tools currently available. The competition will
take place during the 3 days of the Biocuration2016 conference and includes a
prize to be rewarded to the person who adds the most annotations. The
number of annotations will be tracked by a purpose built tool, the Annot
ometer, the code of which will be made available after the conference for
re-use by anyone wishing to run a similar event.<br>
Users
will be asked to register their usernames for the two tools on the Annot ometer
website, those usernames will then be used to poll the API's for IclikVal and
Hypothes.is usage stats. Everyone will be able to view the leader-board
(updated every few minutes) of annotators at anytime during the conference.</p
Enhancements to the GigaScience Integrated Data & Research Object Publishing Pipeline
In the era of computation and data driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. As datasets get larger and more challenging to disseminate, one approach is to focus more on the compute and interactive research objects such as containers and virtual machines. Publishing more technically challenging and dynamic parts of the research cycle will require more transparent and interactive approaches to review, annotate, and credit the hard work of those assessing them, particularly to avoid the growing challenges of the replication crisis. <i>GigaScience</i> is an open-access, open-data journal tailored for the era of large-scale biological data. Using the data handling infrastructure of the genomics center BGI, <i>GigaScience</i> links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data and provides additional analysis tools and computing resources. In addition, the supporting workflows and methods are also integrated. GigaDB has released many new and previously unpublished datasets and data types, and the latest version has a number of new and improved features. Along with a raft of major under-the-bonnet changes to the data structure, submission wizard, a new search function and results display, and the integration of the Hypothes.is web-annotation tools have all been implemented. Web forms integrate the manuscript and data peer review process with Publons, linking and crediting the peer reviews with DataCite DOIs. Protocols.io is also being merged with the data curation process to streamline the process for authors to enter their methodologies in the collaborative protocol-centered repository. Other “executable” research objects such as workflows, virtual machines, docker containers and software snapshots from several <i>GigaScience</i> articles have been archived and shared in the most open, reproducible, transparent and usable formats possible
GigaScience; the journal and database, for open access publishing and data dissemination.
<p>GigaScience (http://www.gigasciencejournal.com) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB (http://www.gigadb.org). GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data.</p>
<p>GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study. Through our association with DataCite, each dataset in GigaDB is assigned a DOI that can be used as a standard citation for future use of these data in other articles by the authors and other researchers. To enable this all datasets have a title, an author list, and an abstract that provides information specific to the data. To maximize its utility to the research community, all data in GigaDB are placed under a CC0 waiver.</p>
<p>We currently host two very popular cancer datasets, “Hepatocellular carcinoma genomic data from the Asia Cancer Research Group” (http://doi.org/10.5524/100034), and “Single cell whole-exome sequences of bladder cancer from an individual.” (http://dx.doi.org/10.5524/100037). The later contains the both the assembled transcriptomes and all the SNP call data.</p
GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis
<p>Today's next generation sequencing (NGS) experiments generate substantially more data and are more broadly applicable than previous high-throughput genomic assays. Despite the plummeting costs of sequencing, downstream data processing and analysis create financial and bioinformatics challenges for many biomedical scientists. It is therefore important to make NGS data interpretation as accessible as data generation. GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk) represents a NGS data interpretation solution towards the big sequencing data challenge. We have ported the popular Short Oligonucleotide Analysis Package (http://soap.genomics.org.cn) into the Galaxy framework, to provide seamless NGS mapping, de novo assembly, NGS data format conversion and sequence alignment visualization. Our vision is to create an open publication, review and analysis environment by integrating GigaGalaxy into the publication platform at GigaScience and its GigaDB database that links to more than 25 TBs of genomic data. We have begun this effort by re-implementing the data procedures described by Luo et al., (GigaScience 1: 18, 2012) as Galaxy workflows so that they can be shared in a manner which can be visualized and executed in GigaGalaxy. We have also described the experiment using the ISA framework to provide a richer and more interoperable description of the experimental workflows. We hope to revolutionize the publication model with the aim of executable publications, where data analyses can be reproduced and reused.</p
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data-1
<p><b>Copyright information:</b></p><p>Taken from "Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data"</p><p>http://www.biomedcentral.com/1471-2105/9/334</p><p>BMC Bioinformatics 2008;9():334-334.</p><p>Published online 7 Aug 2008</p><p>PMCID:PMC2528018.</p><p></p
A schematic diagram showing the relationship between Taverna, the RShell processor, RServe and the R tool
<p><b>Copyright information:</b></p><p>Taken from "Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data"</p><p>http://www.biomedcentral.com/1471-2105/9/334</p><p>BMC Bioinformatics 2008;9():334-334.</p><p>Published online 7 Aug 2008</p><p>PMCID:PMC2528018.</p><p></p
Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.
<p>Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.</p
Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.
<p>Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.</p
Multi-Platform Analysis of MicroRNA Expression Measurements in RNA from Fresh Frozen and FFPE Tissues
<div><p>MicroRNAs play a role in regulating diverse biological processes and have considerable utility as molecular markers for diagnosis and monitoring of human disease. Several technologies are available commercially for measuring microRNA expression. However, cross-platform comparisons do not necessarily correlate well, making it difficult to determine which platform most closely represents the true microRNA expression level in a tissue. To address this issue, we have analyzed RNA derived from cell lines, as well as fresh frozen and formalin-fixed paraffin embedded tissues, using Affymetrix, Agilent, and Illumina microRNA arrays, NanoString counting, and Illumina Next Generation Sequencing. We compared the performance within- and between the different platforms, and then verified these results with those of quantitative PCR data. Our results demonstrate that the within-platform reproducibility for each method is consistently high and although the gene expression profiles from each platform show unique traits, comparison of genes that were commonly detectable showed that detection of microRNA transcripts was similar across multiple platforms.</p> </div
Fractional deviation from the mean miRNA expression for the top ranked 100 miRNA transcripts.
<p>For each sample (A–F), the fractional deviation was plotted by each platform against the mean scaled expression of the ranked miRNA transcripts.</p