27 research outputs found
svmPRAT: SVM-based Protein Residue Annotation Toolkit
<p>Abstract</p> <p>Background</p> <p>Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models.</p> <p>Results</p> <p>We present a general purpose protein residue annotation toolkit (<it>svm</it><monospace>PRAT</monospace>) to allow biologists to formulate residue-wise prediction problems. <it>svm</it><monospace>PRAT</monospace> formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of <it>svm</it><monospace>PRAT</monospace> is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue <it>svm</it><monospace>PRAT</monospace> captures local information around the reside to create fixed length feature vectors. <it>svm</it><monospace>PRAT</monospace> implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training effective predictive models.</p> <p>Conclusions</p> <p>In this work we evaluate <it>svm</it><monospace>PRAT</monospace> on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. <it>svm</it><monospace>PRAT</monospace> has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems.</p> <p><it>Availability</it>: <url>http://www.cs.gmu.edu/~mlbio/svmprat</url></p
VASCo: computation and visualization of annotated protein surface contacts
<p>Abstract</p> <p>Background</p> <p>Structural data from crystallographic analyses contain a vast amount of information on protein-protein contacts. Knowledge on protein-protein interactions is essential for understanding many processes in living cells. The methods to investigate these interactions range from genetics to biophysics, crystallography, bioinformatics and computer modeling. Also crystal contact information can be useful to understand biologically relevant protein oligomerisation as they rely in principle on the same physico-chemical interaction forces. Visualization of crystal and biological contact data including different surface properties can help to analyse protein-protein interactions.</p> <p>Results</p> <p>VASCo is a program package for the calculation of protein surface properties and the visualization of annotated surfaces. Special emphasis is laid on protein-protein interactions, which are calculated based on surface point distances. The same approach is used to compare surfaces of two aligned molecules. Molecular properties such as electrostatic potential or hydrophobicity are mapped onto these surface points. Molecular surfaces and the corresponding properties are calculated using well established programs integrated into the package, as well as using custom developed programs. The modular package can easily be extended to include new properties for annotation. The output of the program is most conveniently displayed in PyMOL using a custom-made plug-in.</p> <p>Conclusion</p> <p>VASCo supplements other available protein contact visualisation tools and provides additional information on biological interactions as well as on crystal contacts. The tool provides a unique feature to compare surfaces of two aligned molecules based on point distances and thereby facilitates the visualization and analysis of surface differences.</p
Intrinsically disordered domains: Sequence ➔ disorder ➔ function relationships
Disordered domains are long regions of intrinsic disorder that ideally have conserved sequences, conserved disorder, and conserved functions. These domains were first noticed in protein–protein interactions that are distinct from the interactions between two structured domains and the interactions between structured domains and linear motifs or molecular recognition features (MoRFs). So far, disordered domains have not been systematically characterized. Here, we present a bioinformatics investigation of the sequence–disorder–function relationships for a set of probable disordered domains (PDDs) identified from the Pfam database. All the Pfam seed proteins from those domains with at least one PDD sequence were collected. Most often, if a set contains one PDD sequence, then all members of the set are PDDs or nearly so. However, many seed sets have sequence collections that exhibit diverse proportions of predicted disorder and structure, thus giving the completely unexpected result that conserved sequences can vary substantially in predicted disorder and structure. In addition to the induction of structure by binding to protein partners, disordered domains are also induced to form structure by disulfide bond formation, by ion binding, and by complex formation with RNA or DNA. The two new findings, (a) that conserved sequences can vary substantially in their predicted disorder content and (b) that homologues from a single domain can evolve from structure to disorder (or vice versa), enrich our understanding of the sequence ➔ disorder ensemble ➔ function paradigm
Measurement and QCD analysis of double-differential inclusive jet cross sections in pp collisions at √s=8 TeV and cross section ratios to 2.76 and 7 TeV
A measurement of the double-differential inclusive jet cross section as a function
of the jet transverse momentum pT and the absolute jet rapidity |y| is presented.
Data from LHC proton-proton collisions at √
s = 8 TeV, corresponding to an integrated
luminosity of 19.7 fb−1
, have been collected with the CMS detector. Jets are reconstructed
using the anti-kT clustering algorithm with a size parameter of 0.7 in a phase space region
covering jet pT from 74 GeV up to 2.5 TeV and jet absolute rapidity up to |y| = 3.0. The
low-pT jet range between 21 and 74 GeV is also studied up to |y| = 4.7, using a dedicated
data sample corresponding to an integrated luminosity of 5.6 pb−1
. The measured
jet cross section is corrected for detector effects and compared with the predictions from
perturbative QCD at next-to-leading order (NLO) using various sets of parton distribution
functions (PDF). Cross section ratios to the corresponding measurements performed at
2.76 and 7 TeV are presented. From the measured double-differential jet cross section, the
value of the strong coupling constant evaluated at the Z mass is αS(MZ) = 0.1164+0.0060
−0.0043,
where the errors include the PDF, scale, nonperturbative effects and experimental uncertainties,
using the CT10 NLO PDFs. Improved constraints on PDFs based on the inclusive
jet cross section measurement are presented