1,388 research outputs found

    Composite structural motifs of binding sites for delineating biological functions of proteins

    Get PDF
    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure

    A framework for protein structure classification and identification of novel protein structures

    Get PDF
    BACKGROUND: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important. RESULTS: In this paper we present a unified framework for protein structure classification and identification of novel protein structures. The framework consists of a set of components for comparing, classifying, and clustering protein structures. These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds. In our evaluation with SCOP 1.69, our method correctly classifies 86.0%, 87.7%, and 90.5% of new domains at family, superfamily, and fold levels. Furthermore, for protein domains that belong to new domain families, our method is able to produce clusters that closely correspond to the new families in SCOP 1.69. As a result, our method can also be used to suggest new classification groups that contain novel folds. CONCLUSION: We have developed a method called proCC for automatically classifying and clustering domains. The method is effective in classifying new domains and suggesting new domain families, and it is also very efficient. A web site offering access to proCC is freely available a

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    Structure-based classification of tauopathies

    Get PDF
    The ordered assembly of tau protein into filaments characterizes several neurodegenerative diseases, which are called tauopathies. It was previously reported that, by cryo-electron microscopy, the structures of tau filaments from Alzheimer’s disease, Pick’s disease, chronic traumatic encephalopathy and corticobasal degeneration are distinct. Here we show that the structures of tau filaments from progressive supranuclear palsy (PSP) define a new three-layered fold. Moreover, the structures of tau filaments from globular glial tauopathy are similar to those from PSP. The tau filament fold of argyrophilic grain disease (AGD) differs, instead resembling the four-layered fold of corticobasal degeneration. The AGD fold is also observed in ageing-related tau astrogliopathy. Tau protofilament structures from inherited cases of mutations at positions +3 or +16 in intron 10 of MAPT (the microtubule-associated protein tau gene) are also identical to those from AGD, suggesting that relative overproduction of four-repeat tau can give rise to the AGD fold. Finally, the structures of tau filaments from cases of familial British dementia and familial Danish dementia are the same as those from cases of Alzheimer’s disease and primary age-related tauopathy. These findings suggest a hierarchical classification of tauopathies on the basis of their filament folds, which complements clinical diagnosis and neuropathology and also allows the identification of new entities—as we show for a case diagnosed as PSP, but with filament structures that are intermediate between those of globular glial tauopathy and PSP

    Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV

    Get PDF
    The performance of muon reconstruction, identification, and triggering in CMS has been studied using 40 inverse picobarns of data collected in pp collisions at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection criteria covering a wide range of physics analysis needs have been examined. For all considered selections, the efficiency to reconstruct and identify a muon with a transverse momentum pT larger than a few GeV is above 95% over the whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4, while the probability to misidentify a hadron as a muon is well below 1%. The efficiency to trigger on single muons with pT above a few GeV is higher than 90% over the full eta range, and typically substantially better. The overall momentum scale is measured to a precision of 0.2% with muons from Z decays. The transverse momentum resolution varies from 1% to 6% depending on pseudorapidity for muons with pT below 100 GeV and, using cosmic rays, it is shown to be better than 10% in the central region up to pT = 1 TeV. Observed distributions of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO

    Azimuthal anisotropy of charged particles at high transverse momenta in PbPb collisions at sqrt(s[NN]) = 2.76 TeV

    Get PDF
    The azimuthal anisotropy of charged particles in PbPb collisions at nucleon-nucleon center-of-mass energy of 2.76 TeV is measured with the CMS detector at the LHC over an extended transverse momentum (pt) range up to approximately 60 GeV. The data cover both the low-pt region associated with hydrodynamic flow phenomena and the high-pt region where the anisotropies may reflect the path-length dependence of parton energy loss in the created medium. The anisotropy parameter (v2) of the particles is extracted by correlating charged tracks with respect to the event-plane reconstructed by using the energy deposited in forward-angle calorimeters. For the six bins of collision centrality studied, spanning the range of 0-60% most-central events, the observed v2 values are found to first increase with pt, reaching a maximum around pt = 3 GeV, and then to gradually decrease to almost zero, with the decline persisting up to at least pt = 40 GeV over the full centrality range measured.Comment: Replaced with published version. Added journal reference and DO

    Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

    Search for new physics with same-sign isolated dilepton events with jets and missing transverse energy

    Get PDF
    A search for new physics is performed in events with two same-sign isolated leptons, hadronic jets, and missing transverse energy in the final state. The analysis is based on a data sample corresponding to an integrated luminosity of 4.98 inverse femtobarns produced in pp collisions at a center-of-mass energy of 7 TeV collected by the CMS experiment at the LHC. This constitutes a factor of 140 increase in integrated luminosity over previously published results. The observed yields agree with the standard model predictions and thus no evidence for new physics is found. The observations are used to set upper limits on possible new physics contributions and to constrain supersymmetric models. To facilitate the interpretation of the data in a broader range of new physics scenarios, information on the event selection, detector response, and efficiencies is provided.Comment: Published in Physical Review Letter

    Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV

    Get PDF
    The performance of muon reconstruction, identification, and triggering in CMS has been studied using 40 inverse picobarns of data collected in pp collisions at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection criteria covering a wide range of physics analysis needs have been examined. For all considered selections, the efficiency to reconstruct and identify a muon with a transverse momentum pT larger than a few GeV is above 95% over the whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4, while the probability to misidentify a hadron as a muon is well below 1%. The efficiency to trigger on single muons with pT above a few GeV is higher than 90% over the full eta range, and typically substantially better. The overall momentum scale is measured to a precision of 0.2% with muons from Z decays. The transverse momentum resolution varies from 1% to 6% depending on pseudorapidity for muons with pT below 100 GeV and, using cosmic rays, it is shown to be better than 10% in the central region up to pT = 1 TeV. Observed distributions of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO

    X-ray emission from the Sombrero galaxy: discrete sources

    Get PDF
    We present a study of discrete X-ray sources in and around the bulge-dominated, massive Sa galaxy, Sombrero (M104), based on new and archival Chandra observations with a total exposure of ~200 ks. With a detection limit of L_X = 1E37 erg/s and a field of view covering a galactocentric radius of ~30 kpc (11.5 arcminute), 383 sources are detected. Cross-correlation with Spitler et al.'s catalogue of Sombrero globular clusters (GCs) identified from HST/ACS observations reveals 41 X-rays sources in GCs, presumably low-mass X-ray binaries (LMXBs). We quantify the differential luminosity functions (LFs) for both the detected GC and field LMXBs, whose power-low indices (~1.1 for the GC-LF and ~1.6 for field-LF) are consistent with previous studies for elliptical galaxies. With precise sky positions of the GCs without a detected X-ray source, we further quantify, through a fluctuation analysis, the GC LF at fainter luminosities down to 1E35 erg/s. The derived index rules out a faint-end slope flatter than 1.1 at a 2 sigma significance, contrary to recent findings in several elliptical galaxies and the bulge of M31. On the other hand, the 2-6 keV unresolved emission places a tight constraint on the field LF, implying a flattened index of ~1.0 below 1E37 erg/s. We also detect 101 sources in the halo of Sombrero. The presence of these sources cannot be interpreted as galactic LMXBs whose spatial distribution empirically follows the starlight. Their number is also higher than the expected number of cosmic AGNs (52+/-11 [1 sigma]) whose surface density is constrained by deep X-ray surveys. We suggest that either the cosmic X-ray background is unusually high in the direction of Sombrero, or a distinct population of X-ray sources is present in the halo of Sombrero.Comment: 11 figures, 5 tables, ApJ in pres
    • …
    corecore