84 research outputs found
On vital aid: the why, what and how of validation
The need for validation of macromolecular crystal structures is discussed. A general approach to validation is presented, together with examples of its implementation in the special case of macromolecular crystallography
Case-controlled structure validation.
Although many factors influence the quality of a macromolecular crystal structure, validation criteria are usually only calibrated using one of these factors, the resolution. For many purposes this is sufficient, but there are times when one wishes to compare one set of structures with another and the comparison may be invalidated by systematic differences between the sets in factors other than resolution. This problem can be circumvented by borrowing from medicine the idea of the case-matched control: each structure of interest is matched with a control structure that has similar values for all relevant factors considered in this study. In addition to resolution, these include the size of the structure (as measured by the volume of the asymmetric unit) and the year of deposition. This approach has been applied to address two questions: whether structures from structural genomics efforts reach the same level of quality as structures from traditional sources and whether the impact factor of the journal in which a structure is published correlates with structure quality. In both cases, once factors influencing quality have been controlled in the comparison, there is little evidence for a systematic difference in quality
PDBe: towards reusable data delivery infrastructure at protein data bank in Europe
© 2017 The Authors. Published by OUP. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1093/nar/gkx1070The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.Wellcome Trust [104948]; UK Biotechnology and Biological Sciences Research Council [BB/M011674/1, BB/N019172/1, BB/M020347/1]; European Union [284209]; European Molecular Biology Laboratory (EMBL). Funding for open access charge: EMBL.Published versio
MIFA: Metadata, Incentives, Formats, and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis
Artificial Intelligence methods are powerful tools for biological image
analysis and processing. High-quality annotated images are key to training and
developing new methods, but access to such data is often hindered by the lack
of standards for sharing datasets. We brought together community experts in a
workshop to develop guidelines to improve the reuse of bioimages and
annotations for AI applications. These include standards on data formats,
metadata, data presentation and sharing, and incentives to generate new
datasets. We are positive that the MIFA (Metadata, Incentives, Formats, and
Accessibility) recommendations will accelerate the development of AI tools for
bioimage analysis by facilitating access to high quality training data.Comment: 16 pages, 3 figure
Genome3D: exploiting structure to help users understand their sequences.
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models
- …