185 research outputs found

    Emerging Chemical Patterns for Virtual Screening and Knowledge Discovery

    Get PDF
    The adaptation and evaluation of contemporary data mining methods to chemical and biological problems is one of major areas of research in chemoinformatics. Currently, large databases containing millions of small organic compounds are publicly available, and the need for advanced methods to analyze these data increases. Most methods used in chemoinformatics, e.g. quantitative structure activity relationship (QSAR) modeling, decision trees and similarity searching, depend on the availability of large high-quality training data sets. However, in biological settings, the availability of these training sets is rather limited. This is especially true for early stages of drug discovery projects where typically only few active molecules are available. The ability of chemoinformatic methods to generalize from small training sets and accurately predict compound properties such as activity, ADME or toxicity is thus crucially important. Additionally, biological data such as results from high-throughput screening (HTS) campaigns is heavily biased towards inactive compounds. This bias presents an additional challenge for the adaptation of data mining methods and distinguishes chemoinformatics data from the standard benchmark scenarios in the data mining community. Even if a highly accurate classifier would be available, it is still necessary to evaluate the predictions experimentally. These experiments are both costly and time-consuming and the need to optimize resources has driven the development of integrated screening protocols which try to minimize experimental efforts but still reaching high hit rates of active compounds. This integration, termed “sequential screening” benefits from the complementary nature of experimental HTS and computational virtual screening (VS) methods. In this thesis, a current data mining framework based on class-specific nominal combinations of attributes (emerging patterns) is adapted to chemoinformatic problems and thoroughly evaluated. Combining emerging pattern methodology and the well-known notion of chemical descriptors, emerging chemical patterns (ECP) are defined as class- specific descriptor value range combinations. Each pattern can be thought of as a region in chemical space which is dominated by compounds from one class only. Based on chemical patterns, several experiments are presented which evaluate the performance of pattern-based knowledge mining, property prediction, compound ranking and sequential screening. ECP-based classification is implemented and evaluated on four activity classes for the prediction of compound potency levels. Compared to decision trees and a Bayesian binary QSAR method, ECP-based classification produces high accuracy in positive and negative classes even on the basis of very small training set, a result especially valuable to chemoinformatic problems. The simple nature of ECPs as class-specific descriptor value range combinations makes them easily interpretable. This is used to related ECPs to changes in the interaction network of protein-ligand complexes when the binding conformation is replaced by a computer-modeled conformation in a knowledge mining experiment. ECPs capture well-known energetic differences between binding and energy-minimized conformations and additionally present new insight into these differences on a class level analysis. Finally, the integration of ECPs and HTS is evaluated in simulated lead-optimization and sequential screening experiments. The high accuracy on very small training sets is exploited to design an iterative simulated lead optimization experiment based on experimental evaluation of randomly selected small training sets. In each iteration, all compounds predicted to be weakly active are removed and the remaining compound set is enriched with highly potent compounds. On this basis, a simulated sequential screening experiment shows that ECP-based ranking recovers 19% of available compounds while reducing the “experimental” effort to 0.2%. These findings illustrate the potential of sequential screening protocols and hopefully increase the popularity of this relatively new methodology

    Powerskin Conference: Proceedings

    Get PDF
    The “third skin†of human beings – the building envelope – has a long history of development with a major impact on architecture. As an interface between inside and outside, facades not only determine aspects such as performance and energy efficiency, they also determine the aesthetics of buildings and cities; to the extend that they can create cultural identity. The invention of the curtain wall made facades independent from the building structure, but it remained an important – yet passive – element.  Powerskin Conference: Proceedings, January 19th 2017– Munich &nbsp

    Die britische Brigg Water Nymph oder: "... dass solche [...] Verhönungen von Beamten auf deutschem Boden auch selbst einem Engländer nicht gestattet sind ..."

    Full text link
    In 2002, in the period preceding the realization of extensive coastal protection measures on the west coast of the island of Darss in Mecklenburg-Western Pomerania, archaeological investigations of a nineteenth-century shipwreck were carried out off the village of Ahrenshoop. In several excavation sections, important details of the ship’s construction were recorded and documented. On the basis of the investigation, intensive archival research led to the identification of the vessel as the British brig Water Nymph and shed light on interesting facts concerning life on board a typical nineteenth-century merchant vessel as well as its later loss off the coast of the German empire

    Semantic Representation of Physics Research Data

    Get PDF
    Improvements in web technologies and artificial intelligence enable novel, more data-driven research practices for scientists. However, scientific knowledge generated from data-intensive research practices is disseminated with unstructured formats, thus hindering the scholarly communication in various respects. The traditional document-based representation of scholarly information hampers the reusability of research contributions. To address this concern, we developed the Physics Ontology (PhySci) to represent physics-related scholarly data in a machine-interpretable format. PhySci facilitates knowledge exploration, comparison, and organization of such data by representing it as knowledge graphs. It establishes a unique conceptualization to increase the visibility and accessibility to the digital content of physics publications. We present the iterative design principles by outlining a methodology for its development and applying three different evaluation approaches: data-driven and criteria-based evaluation, as well as ontology testing
    • …
    corecore