36 research outputs found

    Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

    Full text link
    We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and target-prediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features, our method significantly outperforms methods that learn subpopulation-discovery and target-prediction rules separately. In a materials-by-design case study, our model provides state-of-the-art prediction of polymer glass transition temperature. Importantly, the method identifies cadres of polymers that respond differently to structural perturbations, thus providing design insight for targeting or avoiding specific transition temperature ranges. It identifies chemically meaningful cadres, each with interpretable models. Further experimental results show that cadre methods have generalization that is competitive with linear and nonlinear regression models and can identify robust subpopulations.Comment: 8 pages, 6 figure

    Chemometric Analysis of Ligand Receptor Complementarity:  Identifying Complementary Ligands Based on Receptor Information (CoLiBRI)

    Get PDF
    We have developed a novel structure-based approach to search for Complimentary Ligands Based on Receptor Information (CoLiBRI). CoLiBRI is based on the representation of both receptor binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands are identified by the means of computational geometry technique known as Delaunay tessellation as applied to x-ray characterized ligand-receptor complexes. TAE/RECON1 multiple chemical descriptors are calculated independently for each ligand as well as for its active site atoms. The representation of both ligands and active sites using chemical descriptors allows the application of well-known chemometric techniques in order to correlate chemical similarities between active sites and their respective ligands. From these calculations, we have established a protocol to map patterns of nearest neighbor active site vectors in a multidimensional TAE/RECON space onto those of their complementary ligands, and vice versa. This protocol affords the prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector. This prediction is followed by chemical similarity calculations between this virtual ligand vector and those calculated for molecules in a chemical database to identify real compounds most similar to the virtual ligand. Consequently, the knowledge of the receptor active site structure affords straightforward and efficient identification of its complementary ligands in large databases of chemical compounds using rapid chemical similarity searches. Conversely, starting from the ligand chemical structure, one may identify possible complementary receptor cavities as well. We have applied the CoLiBRI approach to a dataset of 800 x-ray characterized ligand receptor complexes in the PDBbind database2. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection, we have shown that knowledge of the active site structure affords identification of its complimentary ligand among the top 1% of a large chemical database in over 90% of all test active sites when a binding site of the same protein family was present in the training set. In the case where test receptors are highly dissimilar and not present among the receptor families in the training set, the prediction accuracy is decreased; however CoLiBRI was still able to quickly eliminate 75% of the chemical database as improbable ligands. The CoLiBRI approach provides an efficient prescreening tool for large chemical databases prior to traditional, yet much more computationally intensive, three-dimensional docking approaches

    Application of finite element modeling and viscoelasticity theory in characterization and prediction of dielectric relaxation process in polymer nanodielectrics

    Get PDF
    Nanodielectrics, typically defined as polymer composites with nanosized ceramic fillers, have demonstrated significant improvements in electrical endurance, breakdown strength and dielectric constant relative to their constituent materials, which leads to enhanced energy storage capabilities. The key role played by the large interfacial area surrounding nanofillers proves to be essential to the enhancement, yet quantitative models to predict the altered dielectric properties in the interfacial area are rarely seen. In this presentation, we apply a finite element modeling approach, originally developed for viscoelasticity analysis, to predict the frequency and temperature dependence of dielectric permittivity spectra in polymer nanodielectrics containing functionalized silica fillers. The dispersion state of nanofillers in the finite element model is determined from descriptor-based analysis of scanning electron micrographs, and the interfacial area surrounding the fillers is explicitly configured into the geometry. The dielectric permittivity spectra of the polymer matrix are imported into the model using a series of Debye relaxation functions. The analogy between dielectric permittivity and viscoelastic modulus allows for a simple mathematical conversion between the two physically distinct quantities, which enables the usage of Prony Series when fitting the dielectric spectrum. With the assistance of a earlier developed algorithm to fit the viscoelastic modulus, the parameters of Debye relaxation series function are obtained. Using the above morphology and physical property inputs, dielectric spectroscopy experiments over a range of frequencies and temperatures can be simulated. Properties of the interfacial region are obtained through an iterative comparison between model output and experimental results. It is observed that the distribution of dielectric relaxation times of the interface could be expressed using those of the polymer matrix multiplied by frequency shift factors that vary with different functionalization of the silica filler surfaces. Our results indicate that surface energy parameters of the filler and the polymer matrix can vary the dielectric response of the composites, which is consistent with earlier observations of the viscoelastic properties of polymer nanocomposites. Further discussion on the results also provides insight into the underlying dielectric relaxation mechanism in the interfacial area
    corecore