Search CORE

16 research outputs found

The Annotometer; Encouraging uptake and use of freely available curation tools.

Author: Christopher Hunter (1230069)
Laurie Goodman (508466)
Peter Li (19492)
Scott Edmunds (418976)
Xiao Si Zhe (2592946)
Publication venue
Publication date
Field of study

In recent years it has become clear that the amount of data being generated worldwide cannot be curated and annotated by any individual or small group. Currently, there is recognition that one of the best ways to provide ongoing curation is to employ the power of the community. To achieve this, the first hurdle to overcome was the development of user-friendly tools and apps that non-expert curators would be comfortable and capable of using. Such tools are now in place, including IclikVal ( <a href="http://iclikval.riken.jp/" target="_blank">http://iclikval.riken.jp</a> ) and Hypothes.is ( <a href="https://hypothes.is/" target="_blank">https://hypothes.is</a> ). The second problem, which we are now facing, is bringing together and engaging the large number of people needed to perform the curation required to empower these tools. To do this, we need to tip the balance from "wouldn't it be great if there was an app to help me find... " to "the information I need is easy to find in app X ". To achieve this, these apps first need to be seeded with useful information that will allow users to realize their utility and begin to both habitually use and add information to these apps. This will make these tools ever more useful and become a standard part of the process of carrying out research. Here we have prepared and present a competition to encourage the uptake of two of the most mature general curation tools currently available. The competition will take place during the 3 days of the Biocuration2016 conference and includes a prize to be rewarded to the person who adds the most annotations. The number of annotations will be tracked by a purpose built tool, the Annot ometer, the code of which will be made available after the conference for re-use by anyone wishing to run a similar event. Users will be asked to register their usernames for the two tools on the Annot ometer website, those usernames will then be used to poll the API's for IclikVal and Hypothes.is usage stats. Everyone will be able to view the leader-board (updated every few minutes) of annotators at anytime during the conference.</p

FigShare

Enhancements to the GigaScience Integrated Data & Research Object Publishing Pipeline

Author: Christopher Hunter (1230069)
Laurie Goodman (508466)
Nicole Nogoy (720801)
Peter Li (19492)
Scott Edmunds (418976)
Xiao Si Zhe (2592946)
Publication venue
Publication date
Field of study

In the era of computation and data driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. As datasets get larger and more challenging to disseminate, one approach is to focus more on the compute and interactive research objects such as containers and virtual machines. Publishing more technically challenging and dynamic parts of the research cycle will require more transparent and interactive approaches to review, annotate, and credit the hard work of those assessing them, particularly to avoid the growing challenges of the replication crisis. GigaScience is an open-access, open-data journal tailored for the era of large-scale biological data. Using the data handling infrastructure of the genomics center BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data and provides additional analysis tools and computing resources. In addition, the supporting workflows and methods are also integrated. GigaDB has released many new and previously unpublished datasets and data types, and the latest version has a number of new and improved features. Along with a raft of major under-the-bonnet changes to the data structure, submission wizard, a new search function and results display, and the integration of the Hypothes.is web-annotation tools have all been implemented. Web forms integrate the manuscript and data peer review process with Publons, linking and crediting the peer reviews with DataCite DOIs. Protocols.io is also being merged with the data curation process to streamline the process for authors to enter their methodologies in the collaborative protocol-centered repository. Other “executable” research objects such as workflows, virtual machines, docker containers and software snapshots from several GigaScience articles have been archived and shared in the most open, reproducible, transparent and usable formats possible

FigShare

GigaScience; the journal and database, for open access publishing and data dissemination.

Author: Chris Hunter (208050)
Laurie Goodman (454234)
Peter Li (19492)
Robert L Davidson (698298)
Scott Edmunds (418976)
Si Zhe Xiao (454233)
Publication venue
Publication date
Field of study

GigaScience (http://www.gigasciencejournal.com) is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB (http://www.gigadb.org). GigaScience is co-published in collaboration between BGI and BioMed Central, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.” The journal’s scope covers studies from the entire spectrum of the life sciences that produce and use large-scale data as the center of their work. Data from these articles are hosted in GigaDB, from where they can be cited to provide a direct link between the study and the data supporting it, as well as access to relevant tools for reproducing or reusing these data. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study. Through our association with DataCite, each dataset in GigaDB is assigned a DOI that can be used as a standard citation for future use of these data in other articles by the authors and other researchers. To enable this all datasets have a title, an author list, and an abstract that provides information specific to the data. To maximize its utility to the research community, all data in GigaDB are placed under a CC0 waiver. We currently host two very popular cancer datasets, “Hepatocellular carcinoma genomic data from the Asia Cancer Research Group” (http://doi.org/10.5524/100034), and “Single cell whole-exome sequences of bladder cancer from an individual.” (http://dx.doi.org/10.5524/100037). The later contains the both the assembled transcriptomes and all the SNP call data.</p

FigShare

GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis

Author: Alex Wong (96273)
Chris Hunter (208050)
Dennis Chan (419032)
Huayan Gao (419031)
ISA-Team (454235)
Laurie Goodman (454234)
Peter Li (19492)
Ruibang Luo (237944)
Scott Edmunds (418976)
Si Zhe Xiao (454233)
Tin-Lap Lee (18788)
Yong Zhang (5893)
Publication venue
Publication date
Field of study

Today's next generation sequencing (NGS) experiments generate substantially more data and are more broadly applicable than previous high-throughput genomic assays. Despite the plummeting costs of sequencing, downstream data processing and analysis create financial and bioinformatics challenges for many biomedical scientists. It is therefore important to make NGS data interpretation as accessible as data generation. GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk) represents a NGS data interpretation solution towards the big sequencing data challenge. We have ported the popular Short Oligonucleotide Analysis Package (http://soap.genomics.org.cn) into the Galaxy framework, to provide seamless NGS mapping, de novo assembly, NGS data format conversion and sequence alignment visualization. Our vision is to create an open publication, review and analysis environment by integrating GigaGalaxy into the publication platform at GigaScience and its GigaDB database that links to more than 25 TBs of genomic data. We have begun this effort by re-implementing the data procedures described by Luo et al., (GigaScience 1: 18, 2012) as Galaxy workflows so that they can be shared in a manner which can be visualized and executed in GigaGalaxy. We have also described the experiment using the ISA framework to provide a richer and more interoperable description of the experimental workflows. We hope to revolutionize the publication model with the aim of executable publications, where data analyses can be reproduced and reused.</p

FigShare

Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data-1

Author: Carole A Goble (80640)
David Withers (80638)
Douglas B Kell (46500)
Giles Velarde (64303)
Ingo Wassink (80635)
Juan I Castrillo (46763)
Matthew R Pocock (80639)
Peter Li (19492)
Stephen G Oliver (40784)
Stian Soiland-Reyes (99655)
Stuart Owen (80637)
Tom Oinn (19493)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data"http://www.biomedcentral.com/1471-2105/9/334BMC Bioinformatics 2008;9():334-334.Published online 7 Aug 2008PMCID:PMC2528018.</p

FigShare

A schematic diagram showing the relationship between Taverna, the RShell processor, RServe and the R tool

Author: Carole A Goble (80640)
David Withers (80638)
Douglas B Kell (46500)
Giles Velarde (64303)
Ingo Wassink (80635)
Juan I Castrillo (46763)
Matthew R Pocock (80639)
Peter Li (19492)
Stephen G Oliver (40784)
Stian Soiland-Reyes (99655)
Stuart Owen (80637)
Tom Oinn (19493)
Publication venue
Publication date
Field of study

FigShare

Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.

Author: Alejandra González-Beltrán (5664307)
Peter Li (19492)
Jun Zhao (59250)
Maria Susana Avila-Garcia (767122)
Marco Roos (533945)
Mark Thompson (533940)
Eelke van der Horst (767123)
Rajaram Kaliyaperumal (2504425)
Ruibang Luo (237944)
Tin-Lap Lee (18788)
Tak-wah Lam (5664310)
Scott C. Edmunds (767124)
Susanna-Assunta Sansone (15155)
Philippe Rocca-Serra (18677)
Publication venue
Publication date: 08/07/2015
Field of study

Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.</p

FigShare

Archivo Digital UPM

Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.

Author: Alejandra González-Beltrán (5664307)
Eelke van der Horst (767123)
Jun Zhao (59250)
Marco Roos (533945)
Maria Susana Avila-Garcia (767122)
Mark Thompson (533940)
Peter Li (19492)
Philippe Rocca-Serra (18677)
Rajaram Kaliyaperumal (2504425)
Ruibang Luo (237944)
Scott C. Edmunds (767124)
Susanna-Assunta Sansone (15155)
Tak-wah Lam (5664310)
Tin-Lap Lee (18788)
Publication venue
Publication date
Field of study

FigShare

Multi-Platform Analysis of MicroRNA Expression Measurements in RNA from Fresh Frozen and FFPE Tissues

Author: Ann L. Oberg (217897)
Bruce W. Eckloff (278399)
Christopher P. Kolbert (160073)
Debra A. Schultz (278397)
Diane E. Grill (278393)
E. Aubrey Thompson (224877)
Eric D. Wieben (278400)
Fariborz Rakhshan (278392)
Gyorgy Simon (278394)
Jennifer M. Carr (224872)
Jin Jen (160075)
Jin Sung Jang (278395)
Michael Zschunke (278398)
Peter Li (19492)
Ping Yang (56755)
Rod M. Feddersen (278391)
Sumit Middha (93228)
Vernadette Simon (278396)
Wilma Lingle (376980)
Publication venue
Publication date: 31/01/2013
Field of study

<div>MicroRNAs play a role in regulating diverse biological processes and have considerable utility as molecular markers for diagnosis and monitoring of human disease. Several technologies are available commercially for measuring microRNA expression. However, cross-platform comparisons do not necessarily correlate well, making it difficult to determine which platform most closely represents the true microRNA expression level in a tissue. To address this issue, we have analyzed RNA derived from cell lines, as well as fresh frozen and formalin-fixed paraffin embedded tissues, using Affymetrix, Agilent, and Illumina microRNA arrays, NanoString counting, and Illumina Next Generation Sequencing. We compared the performance within- and between the different platforms, and then verified these results with those of quantitative PCR data. Our results demonstrate that the within-platform reproducibility for each method is consistently high and although the gene expression profiles from each platform show unique traits, comparison of genes that were commonly detectable showed that detection of microRNA transcripts was similar across multiple platforms. </div

Directory of Open Access Journals

PubMed Central

FigShare

Fractional deviation from the mean miRNA expression for the top ranked 100 miRNA transcripts.

Author: Ann L. Oberg (217897)
Bruce W. Eckloff (278399)
Christopher P. Kolbert (160073)
Debra A. Schultz (278397)
Diane E. Grill (278393)
E. Aubrey Thompson (224877)
Eric D. Wieben (278400)
Fariborz Rakhshan (278392)
Gyorgy Simon (278394)
Jennifer M. Carr (224872)
Jin Jen (160075)
Jin Sung Jang (278395)
Michael Zschunke (278398)
Peter Li (19492)
Ping Yang (56755)
Rod M. Feddersen (278391)
Sumit Middha (93228)
Vernadette Simon (278396)
Wilma Lingle (376980)
Publication venue
Publication date
Field of study

For each sample (A–F), the fractional deviation was plotted by each platform against the mean scaled expression of the ranked miRNA transcripts.</p

FigShare