54 research outputs found

    A Self-Organizing Algorithm for Modeling Protein Loops

    Get PDF
    Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies

    Accurate prediction of clinical stroke scales and improved biomarkers of motor impairment from robotic measurements

    Get PDF
    Objective: One of the greatest challenges in clinical trial design is dealing with the subjectivity and variability introduced by human raters when measuring clinical end-points. We hypothesized that robotic measures that capture the kinematics of human movements collected longitudinally in patients after stroke would bear a significant relationship to the ordinal clinical scales and potentially lead to the development of more sensitive motor biomarkers that could improve the efficiency and cost of clinical trials. Materials and methods: We used clinical scales and a robotic assay to measure arm movement in 208 patients 7, 14, 21, 30 and 90 days after acute ischemic stroke at two separate clinical sites. The robots are low impedance and low friction interactive devices that precisely measure speed, position and force, so that even a hemiparetic patient can generate a complete measurement profile. These profiles were used to develop predictive models of the clinical assessments employing a combination of artificial ant colonies and neural network ensembles. Results: The resulting models replicated commonly used clinical scales to a cross-validated R2 of 0.73, 0.75, 0.63 and 0.60 for the Fugl-Meyer, Motor Power, NIH stroke and modified Rankin scales, respectively. Moreover, when suitably scaled and combined, the robotic measures demonstrated a significant increase in effect size from day 7 to 90 over historical data (1.47 versus 0.67). Discussion and conclusion: These results suggest that it is possible to derive surrogate biomarkers that can significantly reduce the sample size required to power future stroke clinical trials

    On the Use of Information Theory for Assessing Molecular Diversity

    No full text
    In a recent article published in Molecules, Lin presented a novel approach for assessing molecular diversity based on Shannon’s information theory. In this method, a set of compounds is viewed as a static collection of microstates which can register information about their environment at some predetermined capacity. Diversity is directly related to the information conveyed by the population, as quantified by Shannon’s classical entropy equation. Despite its intellectual appeal, this method is characterized by a strong tendency to oversample remote areas of the feature space and produce unbalanced designs. This paper demonstrates this limitation with some simple examples and provides a rationale for the failure of the method to produce results that are consistent with other traditional methodologies

    The ABCD of data management

    No full text

    A geodesic framework for analyzing molecular similarities

    No full text
    A fast self-organizing algorithm for extracting the minimum number of independent variables that can fully describe a set of observations was recently described (Agrafiotis, D. K.; Xu, H. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 15869-15872). The method, called stochastic proximity embedding (SPE), attempts to generate low-dimensional Euclidean maps that best preserve the similarities between a set of related objects. Unlike conventional multidimensional scaling (MDS) and nonlinear mapping (NLM), SPE preserves only local relationships and, by doing so, reveals the intrinsic dimensionality and metric structure of the data. Its success depends critically on the choice of the neighborhood radius, which should be consistent with the local curvature of the underlying manifold. Here, we describe a procedure for determining that radius by examining the tradeoff between the stress function and the number of connected components in the neighborhood graph and show that it can be used to produce meaningful maps in any embedding dimension. The power of the algorithm is illustrated in two major areas of computational drug design: conformational analysis and diversity profiling of large chemical libraries. I

    Nonlinear Mapping Networks

    No full text
    Among the many dimensionality reduction techniques that have appeared in the statistical literature, multidimensional scaling and nonlinear mapping are unique for their conceptual simplicity and ability to reproduce the topology and structure of the data space in a faithful and unbiased manner. However, a major shortcoming of these methods is their quadratic dependence on the number of objects scaled, which imposes severe limitations on the size of data sets that can be effectively manipulated. Here we describe a novel approach that combines conventional nonlinear mapping techniques with feed-forward neural networks, and allows the processing of data sets orders of magnitude larger than those accessible with conventional methodologies. Rooted on the principle of probability sampling, the method employs a classical algorithm to project a small random sample, and then “learns ” the underlying nonlinear transform using a multilayer neural network trained with the back-propagation algorithm. Once trained, the neural network can be used in a feed-forward manner to project the remaining members of the population as well as new, unseen samples with minimal distortion. Using examples from the fields of image processing and combinatorial chemistry, we demonstrate that this method can generate projections that are virtually indistinguishable from those derived by conventional approaches. The ability to encode the nonlinear transform in the form of a neural network makes nonlinear mapping applicable to a wide variety of data mining applications involving very large data sets that are otherwise computationally intractable. I

    Combinatorial Chemistry & High-Throughput Screening., in press Scalable Methods for the Construction and Analysis of Virtual Combinatorial ABSTRACT Libraries

    No full text
    One can distinguish between two kinds of virtual combinatorial libraries: “viable ” and “accessible”. Viable libraries are relatively small in size, are assembled from readily available reagents that have been filtered by the medicinal chemist, and often have a physical counterpart. Conversely, accessible libraries can encompass millions or billions of structures, typically include all possible reagents that are in principle compatible with a particular reaction scheme, and they can never be physically synthesized in their entirety. Although the analysis of viable virtual libraries is relatively straightforward, the handling of large accessible libraries requires methods that scale well with respect to library size. In this work, we present novel, efficient and scalable techniques for the construction, analysis, and in silico screening of massive virtual combinatorial libraries
    • …
    corecore