34 research outputs found
Screening Frequency.
<p>The number of distinct compound-target combinations screened in multiple assays, listed for increasing numbers of assays.</p
Bioactivity of drug-target bicluster 1.
<p>The vertical axis lists the drugs in this bicluster by common name, and the horizontal axis represents the UniProt names for the representative targets of each sequence-similar target cluster. The compound-target pairs are colored according to one of six colors: untested in PubChem BioAssay (black), inactive in PubChem BioAssay (grey), active in PubChem BioAssay (dark green), untested but annotated as active in DrugBank (green), inactive in PubChem BioAssay but annotated as active in DrugBank (blue), and active and also annotated as active in DrugBank (light green). Rows and columns are sorted by bioactivity profile similarity.</p
Screening frequency of FDA approved and non-FDA compounds against increasing numbers of protein targets.
<p>Data is included from all assay experiments in PubChem BioAssay annotated with one clearly defined protein target, and reporting an active score for at least one small molecule. Multiple assays against the same target are counted only once.</p
Bioactivity data mining strategy.
<p>Public bioactivity data was first summarized in a compound-target bioactivity matrix (<b>A</b>). Protein targets and small molecules were clustered by sequence (<b>B</b>) and structure (<b>C</b>) respectively, and compound-target sets with shared bioactivity profiles were identified with biclustering (<b>D</b>). For small molecules, the distributions of (<b>E</b>) target selectivity (the number of active targets) and (<b>F</b>) hit ratio (the fraction of screened targets that are active) were quantified. For protein targets, enriched GO (Gene Ontology) terms (<b>G</b>) among proteins with common bioactivity were identified, and a network (<b>H</b>) was constructed which connects target proteins with similar bioactivity profiles. These analyses highlight several interesting bioactivity patterns, identify promiscuous and selective compounds, and identify druggable protein targets and protein domains.</p
Top pfam domains in each bicluster.
<p>Shown are the top 16 highest scoring drug-target biclusters with more than one compound and more than one target. The number of drugs (cids) and targets is shown in columns 2 and 3, respectively. The 4th and 5th columns give the name of the most abundant domain and its frequency, respectively. The last (6th) column shows the BicBin score, representing the density and size of the bicluster. The BicBin score is the negative exponent of the Chernoff Bound. It is inversely proportional to the probability of each bicluster occurring by random chance, as described in Methods.</p
Mixture distribution of hit ratios.
<p>The probability density of hit ratios (<i>θ</i>) shown here, is an equally weighted convex combination of hit ratio probabilities for individual compounds, which represents the probability of any individual compound from a set having a specific hit ratio. Smoothing was applied to reduce sampling noise in low probability regions. The colored bars highlight a region of each probability distribution, with arrows pointing to a close-up plot of the probability density in that region. <b>(A)</b> Hit ratio distributions for FDA approved compounds vs non-FDA approved compounds. <b>(B)</b> Hit ratio distributions for aggregator compounds vs non-aggregators. <b>(C)</b> Hit ratio distributions for PAINS vs non-PAINS.</p
Molecular Function Gene Ontology (MF GO) slim term enrichment vs domain selectivity.
<p>Pfam domains are binned by the median domain selectivity of active compounds against targets with these domains, as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0171413#pone.0171413.t003" target="_blank">Table 3</a>. The domains in each bin were computed separately based on FDA approved and non-FDA compounds, shown here side by side. For each bin of domain selectivity, the enrichment of MF GO slim terms against the background of all bins is shown. Enriched terms are sorted increasingly by the lowest p-value obtained, with all terms shown here having a p-value < 0.05. The right column dot plot shows the number of protein targets in PubChem BioAssay annotated with each MF GO slim term.</p
Molecular Function Gene Ontology Slim (MF GO Slim) term enrichment for each drug-target bicluster.
<p>Enrichment measured by hypergeometric test. Terms with <i>p</i> ≤ 0.05 are shown and sorted increasingly.</p
Frequency of pfam domains binned by median domain selectivity of active compounds.
<p>Each row represents a set of Pfam domains whose active compounds (against targets with that domain) have a median domain selectivity in the range specified. Domain selectivity is the same as introduced in the “Target Selectivity Distribution” section above, where active targets sharing a common domain are counted only once. The ranges are ordered from top to bottom by increasing number of distinct domain active targets. We report bin counts separately for FDA Approved and Non-FDA compounds.</p
Frequency of active pubchem bioassay compounds across protein target domains.
<p>The target proteins represented in PubChem BioAssay have been classified by Pfam protein domains present in the <i>H. sapiens</i> proteome (vertical axis). We report data for all proteins which encode a Pfam domain present in the <i>H. sapiens</i> proteome, even if the assay was performed against a protein from another species. We show here only domains with at least 100 amino acid residues in the homology model, to avoid small repeats and domains unlikely to be drug targets. Additionally, we report for multi-domain clusters only the most frequent and functionally descriptive members as outlined in the Methods section (see “De-duplication of Single Domain Clusters”). Domains of unknown function (DUFs) were also removed since they are rarely the functional target of bioassays. The quantity of targets with each domain among the PubChem BioAssay data, and within the <i>H. sapiens</i> proteome (all proteins, including those not screened in PubChem BioAssay) are shown on the right in both plots. <b>(A)</b> The top 30 Pfam domains with the greatest number of active FDA approved drugs, in decreasing order. <b>(B)</b> The top 34 Pfam domains with the greatest number of non-FDA compounds, but no active FDA approved drugs, in decreasing order. A full table is provided in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0171413#pone.0171413.s010" target="_blank">S2 File</a> of Supporting Information including the number of active compounds for each domain, non-<i>H. sapiens</i> domains, all domains occurring on the same proteins, and domains with less than 100 residues.</p