88 research outputs found
Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces
© 2016 Kontopoulos et al.Background: The term molecular cartography encompasses a family of computational methods for two-dimensional transformation of protein structures and analysis of their physicochemical properties. The underlying algorithms comprise multiple manual steps, whereas the few existing implementations typically restrict the user to a very limited set of molecular descriptors. Results: We present Structuprint, a free standalone software that fully automates the rendering of protein surface maps, given - at the very least - a directory with a PDB file and an amino acid property. The tool comes with a default database of 328 descriptors, which can be extended or substituted by user-provided ones. The core algorithm comprises the generation of a mould of the protein surface, which is subsequently converted to a sphere and mapped to two dimensions, using the Miller cylindrical projection. Structuprint is partly optimized for multicore computers, making the rendering of animations of entire molecular dynamics simulations feasible. Conclusions: Structuprint is an efficient application, implementing a molecular cartography algorithm for protein surfaces. According to the results of a benchmark, its memory requirements and execution time are reasonable, allowing it to run even on low-end personal computers. We believe that it will be of use - primarily but not exclusively - to structural biologists and computational biochemists
Protein signatures using electrostatic molecular surfaces in harmonic space
We developed a novel method based on the Fourier analysis of protein
molecular surfaces to speed up the analysis of the vast structural data
generated in the post-genomic era. This method computes the power spectrum of
surfaces of the molecular electrostatic potential, whose three-dimensional
coordinates have been either experimentally or theoretically determined. Thus
we achieve a reduction of the initial three-dimensional information on the
molecular surface to the one-dimensional information on pairs of points at a
fixed scale apart. Consequently, the similarity search in our method is
computationally less demanding and significantly faster than shape comparison
methods. As proof of principle, we applied our method to a training set of
viral proteins that are involved in major diseases such as Hepatitis C, Dengue
fever, Yellow fever, Bovine viral diarrhea and West Nile fever. The training
set contains proteins of four different protein families, as well as a
mammalian representative enzyme. We found that the power spectrum successfully
assigns a unique signature to each protein included in our training set, thus
providing a direct probe of functional similarity among proteins. The results
agree with established biological data from conventional structural
biochemistry analyses.Comment: 9 pages, 10 figures Published in PeerJ (2013),
https://peerj.com/articles/185
Recommended from our members
AmalgamScope: merging annotations data across the human genome
The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool
3D structural analysis of proteins using electrostatic surfaces based on image segmentation
Herein, we present a novel strategy to analyse and characterize proteins using protein molecular electrostatic surfaces. Our approach starts by calculating a series of distinct molecular surfaces for each protein that are subsequently flattened out, thus reducing 3D information noise. RGB images are appropriately scaled by means of standard image processing techniques whilst retaining the weight information of each protein’s molecular electrostatic surface. Then homogeneous areas in the protein surface are estimated based on unsupervised clustering of the 3D images, while performing similarity searches. This is a computationally fast approach, which efficiently highlights interesting structural areas among a group of proteins. Multiple protein electrostatic surfaces can be combined together and in conjunction with their processed images, they can provide the starting material for protein structural similarity and molecular docking experiments. Â
Outcome prediction based on microarray analysis: a critical perspective on methods
<p>Abstract</p> <p>Background</p> <p>Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation.</p> <p>Results</p> <p>A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance.</p> <p>Conclusion</p> <p>Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.</p
On a meaningful integration of web services in data-intensive biomedical environments: The DICODE approach
This paper reports on an innovative approach that aims to reduce information management costs in data-intensive and cognitively-complex biomedical environments. Recognizing the importance of prominent high-performance computing paradigms and large data processing technologies as well as collaboration support systems to remedy data-intensive issues, it adopts a hybrid approach by building on the synergy of these technologies. The proposed approach provides innovative Web-based workbenches that integrate and orchestrate a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities
Ordered weighted average based grouping of nanomaterials with Arsinh and dose response similarity models
Environmental Biolog
Erratum to: Structuprint: a scalable and extensible tool for two-dimensional representation of protein surfaces
Data S1: Original datasets, raw data results, summary file results for each dataset separated in folders
Deliverable Raport D4.6 Tools for generating QMRF and QPRF reports
Scientific reports carry significant importance for the straightforward and effective transfer of knowledge, results and ideas. Good practice dictates that reports should be well-structured and concise. This deliverable describes the reporting services for models, predictions and validation tasks that have been integrated within the eNanoMapper (eNM) modelling infrastructure. Validation services have been added to the Jaqpot Quattro (JQ) modelling platform and the nano-lazar read-across framework developed within WP4 to support eNM modelling activities. Moreover, we have proceeded with the development of reporting services for predictions and models, respectively QPRF and QMRF reports. Therefore, in this deliverable, we first describe the three validation schemes created, namely training set split, cross- and external validation in detail and demonstrate their functionality both on API and UI levels. We then proceed with the description of the read across functionalities and finally, we present and describe the QPRF and QMRF reporting services
- …