435 research outputs found

    Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets.</p> <p>Results</p> <p>Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not.</p> <p>Conclusion</p> <p>On the basis of chemical structure content <it>per se </it>public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.</p

    Prediction of potential drug targets based on simple sequence properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets.</p> <p>Results</p> <p>Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research.</p> <p>Conclusion</p> <p>We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.</p

    Measurement of CP-violation asymmetries in D0 to Ks pi+ pi-

    Get PDF
    We report a measurement of time-integrated CP-violation asymmetries in the resonant substructure of the three-body decay D0 to Ks pi+ pi- using CDF II data corresponding to 6.0 invfb of integrated luminosity from Tevatron ppbar collisions at sqrt(s) = 1.96 TeV. The charm mesons used in this analysis come from D*+(2010) to D0 pi+ and D*-(2010) to D0bar pi-, where the production flavor of the charm meson is determined by the charge of the accompanying pion. We apply a Dalitz-amplitude analysis for the description of the dynamic decay structure and use two complementary approaches, namely a full Dalitz-plot fit employing the isobar model for the contributing resonances and a model-independent bin-by-bin comparison of the D0 and D0bar Dalitz plots. We find no CP-violation effects and measure an asymmetry of ACP = (-0.05 +- 0.57 (stat) +- 0.54 (syst))% for the overall integrated CP-violation asymmetry, consistent with the standard model prediction.Comment: 15 page

    Integrating Statistical Predictions and Experimental Verifications for Enhancing Protein-Chemical Interaction Predictions in Virtual Screening

    Get PDF
    Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space

    Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV

    Get PDF
    The performance of muon reconstruction, identification, and triggering in CMS has been studied using 40 inverse picobarns of data collected in pp collisions at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection criteria covering a wide range of physics analysis needs have been examined. For all considered selections, the efficiency to reconstruct and identify a muon with a transverse momentum pT larger than a few GeV is above 95% over the whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4, while the probability to misidentify a hadron as a muon is well below 1%. The efficiency to trigger on single muons with pT above a few GeV is higher than 90% over the full eta range, and typically substantially better. The overall momentum scale is measured to a precision of 0.2% with muons from Z decays. The transverse momentum resolution varies from 1% to 6% depending on pseudorapidity for muons with pT below 100 GeV and, using cosmic rays, it is shown to be better than 10% in the central region up to pT = 1 TeV. Observed distributions of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO

    Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV

    Get PDF
    The performance of muon reconstruction, identification, and triggering in CMS has been studied using 40 inverse picobarns of data collected in pp collisions at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection criteria covering a wide range of physics analysis needs have been examined. For all considered selections, the efficiency to reconstruct and identify a muon with a transverse momentum pT larger than a few GeV is above 95% over the whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4, while the probability to misidentify a hadron as a muon is well below 1%. The efficiency to trigger on single muons with pT above a few GeV is higher than 90% over the full eta range, and typically substantially better. The overall momentum scale is measured to a precision of 0.2% with muons from Z decays. The transverse momentum resolution varies from 1% to 6% depending on pseudorapidity for muons with pT below 100 GeV and, using cosmic rays, it is shown to be better than 10% in the central region up to pT = 1 TeV. Observed distributions of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO

    X-ray emission from the Sombrero galaxy: discrete sources

    Get PDF
    We present a study of discrete X-ray sources in and around the bulge-dominated, massive Sa galaxy, Sombrero (M104), based on new and archival Chandra observations with a total exposure of ~200 ks. With a detection limit of L_X = 1E37 erg/s and a field of view covering a galactocentric radius of ~30 kpc (11.5 arcminute), 383 sources are detected. Cross-correlation with Spitler et al.'s catalogue of Sombrero globular clusters (GCs) identified from HST/ACS observations reveals 41 X-rays sources in GCs, presumably low-mass X-ray binaries (LMXBs). We quantify the differential luminosity functions (LFs) for both the detected GC and field LMXBs, whose power-low indices (~1.1 for the GC-LF and ~1.6 for field-LF) are consistent with previous studies for elliptical galaxies. With precise sky positions of the GCs without a detected X-ray source, we further quantify, through a fluctuation analysis, the GC LF at fainter luminosities down to 1E35 erg/s. The derived index rules out a faint-end slope flatter than 1.1 at a 2 sigma significance, contrary to recent findings in several elliptical galaxies and the bulge of M31. On the other hand, the 2-6 keV unresolved emission places a tight constraint on the field LF, implying a flattened index of ~1.0 below 1E37 erg/s. We also detect 101 sources in the halo of Sombrero. The presence of these sources cannot be interpreted as galactic LMXBs whose spatial distribution empirically follows the starlight. Their number is also higher than the expected number of cosmic AGNs (52+/-11 [1 sigma]) whose surface density is constrained by deep X-ray surveys. We suggest that either the cosmic X-ray background is unusually high in the direction of Sombrero, or a distinct population of X-ray sources is present in the halo of Sombrero.Comment: 11 figures, 5 tables, ApJ in pres

    Compressed representation of a partially defined integer function over multiple arguments

    Get PDF
    In OLAP (OnLine Analitical Processing) data are analysed in an n-dimensional cube. The cube may be represented as a partially defined function over n arguments. Considering that often the function is not defined everywhere, we ask: is there a known way of representing the function or the points in which it is defined, in a more compact manner than the trivial one

    RAS gene polymorphisms, classical risk factors and the advent of coronary artery disease in the Portuguese population

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several polymorphisms within the renin-angiotensin system cluster of genes have been associated with the advent of coronary artery disease (CAD) or related pathologies. We investigated the distribution of 5 of these polymorphisms in order to find any association with CAD development and distinguish if any of the biochemical and behavioural factors interact with genetic polymorphisms in the advent of the disease.</p> <p>Methods</p> <p><it>ACE </it>I/D (rs4340), <it>ACE </it>A11860G (rs4343), <it>AT1R </it>A1166C (rs5186), <it>AGT </it>T174M (rs4762) and <it>AGT </it>M235T (rs699) gene polymorphisms were PCR-RFLP analysed in 298 CAD patients and 510 controls from Portugal. Several biochemical and behavioural markers were obtained.</p> <p>Results</p> <p><it>ACE </it>I/D DD and <it>ACE</it>11860 GG genotypes are risk factors for CAD in this population. The simultaneous presence of <it>ACE </it>I/D I and <it>ACE</it>11860 A alleles corresponds to a significant trend towards a decrease in CAD incidence. We found several synergistic effects between the studied polymorphisms and classical risk factors such as hypertension, obesity, diabetes and dyslipidaemia: the presence of the DD genotype of <it>ACE </it>I/D (and also <it>ACE</it>11860 GG) increases the odds of developing CAD when associated to each one of these classical risk factors, particularly when considering the male and early onset CAD subgroup analysis; <it>AGT</it>235 TT also increases the CAD risk in the presence of hypertension and dyslipidaemia, and <it>AT1R</it>1166 interacts positively with hypertension, smoking and obesity.</p> <p>Conclusion</p> <p><it>ACE </it>polymorphisms were shown to play a major role in individual susceptibility to develop CAD. There is also a clear interaction between RAS predisposing genes and some biochemical/environmental risk factors in CAD onset, demonstrating a significant enhancement of classical markers particularly by <it>ACE </it>I/D and <it>ACE</it>11860.</p
    corecore