11 research outputs found

    Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

    Get PDF
    Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks, this assumption is undesirable. For example, when performing entity resolution, the size of each cluster is often unrelated to the size of the data set. Consequently, each cluster contains a negligible fraction of the total number of data points. Such tasks therefore require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the \emph{microclustering property} and introducing a new model that exhibits this property. We compare this model to several commonly used clustering models by checking model fit using real and simulated data sets

    AI is a viable alternative to high throughput screening: a 318-target study

    Get PDF
    : High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery

    Flexible models for microclustering with application to entity resolution

    No full text
    Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.Comment: 15 pages, 3 figures, 1 table, to appear NIPS 2016. arXiv admin note: text overlap with arXiv:1512.0079

    Construction and characterization of a conditionally active version of the serine/threonine kinase Akt

    No full text
    Akt is a serine/threonine kinase that requires a functional phosphatidylinositol 3-kinase to be stimulated by insulin and other growth factors. When directed to membranes by the addition of a src myristoylation sequence, Akt becomes constitutively active. In the present study, a conditionally active version of Akt was constructed by fusing the Akt containing the myristoylation sequence to the hormone binding domain of a mutant murine estrogen receptor that selectively binds 4-hydroxytamoxifen. The chimeric protein was expressed in NIH3T3 cells and was shown to be stimulated by hormone treatment 17-fold after only a 20-min treatment. This hormone treatment also stimulated an approximate 3-fold increase in the phosphorylation of the chimeric protein and a shift in its migration on SDS gels. Activation of this conditionally active Akt resulted in the rapid stimulation of the 70-kDa S6 kinase. This conditionally active Akt was also found to rapidly stimulate in these cells the phosphorylation of properties of PHAS-I, a key protein in the regulation of protein synthesis. The conditionally active Akt, when expressed in 3T3-L1 adipocytes, was also stimulated, although its rate and extent of activation was less then in the NIH3T3 cells. Its stimulation was shown to be capable of inducing glucose uptake into adipocytes by stimulating translocation of the insulin-responsive glucose transporter GLUT4 to the plasma membrane
    corecore