Search CORE

9 research outputs found

The RCSB Protein Data Bank: redesigned web site and web services

Author: A. Prlic
Ashburner
B. Beran
B. Yukich
Berman
Berman
C. Bi
C. Zardecki
Chung
Cuff
D. Dimitropoulos
D. S. Goodsell
Deshpande
Dutta
G. B. Quinn
H. M. Berman
Henrick
Hopf
J. D. Westbrook
J. Young
Klein
Lukasik
M. Quesada
P. E. Bourne
P. W. Rose
Potluri
Shindyalov
Shinobu
Tanaka
Tars
W. F. Bluhm
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education

CiteSeerX

Crossref

PubMed Central

Open Access: Taking Full Advantage of the Content

Author: Bourne Philip E.
Fink J. Lynn
Gerstein Mark
Publication venue: Public Library of Science
Publication date: 01/03/2008
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Quality assurance for the query and distribution systems of the RCSB Protein Data Bank

Author: A. Prlic
B. Beran
B. Yukich
Berman
Berman
Bhat
Brenner
C. Bi
C. Shah
D. Dimitropoulos
Deshpande
Devos
G. B. Quinn
Gilks
H. M. Berman
J. Young
Koru
Nilsson
P. E. Bourne
P. W. Rose
Prlic
W. F. Bluhm
Westbrook
Westbrook
Publication venue: Oxford University Press
Publication date
Field of study

The RCSB Protein Data Bank (RCSB PDB, www.pdb.org) is a key online resource for structural biology and related scientific disciplines. The website is used on average by 165 000 unique visitors per month, and more than 2000 other websites link to it. The amount and complexity of PDB data as well as the expectations on its usage are growing rapidly. Therefore, ensuring the reliability and robustness of the RCSB PDB query and distribution systems are crucially important and increasingly challenging. This article describes quality assurance for the RCSB PDB website at several distinct levels, including: (i) hardware redundancy and failover, (ii) testing protocols for weekly database updates, (iii) testing and release procedures for major software updates and (iv) miscellaneous monitoring and troubleshooting tools and practices. As such it provides suggestions for how other websites might be operated

Crossref

PubMed Central

Utopia documents: linking scholarly literature with research data

Author: Attwood
Boeckmann
Ceol
D. B. Kell
D. Thorne
Giles
Giles
J. Marsh
Kouranov
Kumar
Lee
P. McDermott
Pafilis
Renear
Ruthensteiner
S. R. Pettifer
Seringhaus
Shotton
Smith
T. K. Attwood
Publication venue: Oxford University Press
Publication date: 04/09/2010
Field of study

Motivation: In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Text mining for the biocuration workflow

Author: Arighi Cecilia
Burns Gully A. P. C
Chatr-Aryamontri Andrew
Cohen K. Bretonnel
Dowell Karen G.
Hirschman Lynette
Huala Eva
Krallinger Martin
Lourenço Anália
Nash Robert
Valencia Alfonso
Veuthey Anne-Lise
Wiegers Thomas
Winter Andrew G.
Wu Cathy H.
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community

Universidade do Minho: RepositoriUM

PubMed Central

Calling International Rescue: knowledge lost in literature and data landslide!

We live in interesting times. Portents of impending catastrophe pervade the literature, calling us to action in the face of unmanageable volumes of scientific data. But it isn't so much data generation per se, but the systematic burial of the knowledge embodied in those data that poses the problem: there is so much information available that we simply no longer know what we know, and finding what we want is hard – too hard. The knowledge we seek is often fragmentary and disconnected, spread thinly across thousands of databases and millions of articles in thousands of journals. The intellectual energy required to search this array of data-archives, and the time and money this wastes, has led several researchers to challenge the methods by which we traditionally commit newly acquired facts and knowledge to the scientific record. We present some of these initiatives here – a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment. With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist? We explore these questions, and, as you read on, we invite you to engage in an experiment with us, a real-time test of a new technology to rescue data from the dormant pages of published documents. We ask you, please, to read the instructions carefully. The time has come: you may turn over your papers

Crossref

PubMed Central

Adventures in data citation: sorghum genome data exemplifies the new gold standard

Author: Alexandra T Basford
Brian Hole
Scott C Edmunds
Tom J Pollard
Publication venue: Springer Nature
Publication date: 01/07/2012
Field of study

Scientific progress is driven by the availability of information, which makes it essential that data be broadly, easily and rapidly accessible to researchers in every field. In addition to being good scientific practice, provision of supporting data in a convenient way increases experimental transparency and improves research efficiency by reducing unnecessary duplication of experiments. There are, however, serious constraints that limit extensive data dissemination. One such constraint is that, despite providing a major foundation of data to the advantage of entire community, data producers rarely receive the credit they deserve for the substantial amount of time and effort they spend creating these resources. In this regard, a formal system that provides recognition for data producers would serve to incentivize them to share more of their data. The process of data citation, in which the data themselves are cited and referenced in journal articles as persistently identifiable bibliographic entities, is a potential way to properly acknowledge data output. The recent publication of several sorghum genomes in Genome Biology is a notable first example of good data citation practice in the field of genomics and demonstrates the practicalities and formatting required for doing so. It also illustrates how effective use of persistent identifiers can augment the submission of data to the current standard scientific repositories

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integration of open access literature into the RCSB Protein Data Bank using BioLit

Author: Beran Bojan
Bourne Philip E
Dimitropoulos Dimitris
Fink J Lynn
Martinez Marco A
Prlić Andreas
Rose Peter W
Yukich Benjamin T
Publication venue: BMC
Publication date: 01/01/2010
Field of study

Abstract Background Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB). Results BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java). Conclusions The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.</p

Springer - Publisher Connector

Directory of Open Access Journals

University of Queensland eSpace