810 research outputs found

    Software for supporting large scale data processing for High Throughput Screening

    Get PDF
    High Throughput Screening for is a valuable data generation technique for data driven knowledge discovery. Because the rate of data generation is so great, it is a challenge to cope with the demands of post experiment data analysis. This thesis presents three software solutions that I implemented in an attempt to alleviate this problem. The first is K-Screen, a Laboratory Information Management System designed to handle and visualize large High Throughput Screening datasets. K-Screen is being successfully used by the University of Kansas High Throughput Screening Laboratory to better organize and visualize their data. The next two algorithms are designed to accelerate the search times for chemical similarity searches using 1-dimensional fingerprints. The first algorithm balances information content in bit strings to attempt to find more optimal ordering and segmentation patterns for chemical fingerprints. The second algorithm eliminates redundant pruning calculations for large batch chemical similarity searches and shows a 250% improvement for the fastest current fingerprint search algorithm for large batch queries

    Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing

    Get PDF
    Motivation: Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries

    Optimization of star research algorithm for esmo star tracker

    Get PDF
    This paper explains in detail the design and the development of a software research star algorithm, embedded on a star tracker, by the ISAE/SUPAERO team. This research algorithm is inspired by musical techniques. This work will be carried out as part of the ESMO (European Student Moon Orbiter) project by different teams of students and professors from ISAE/SUPAERO (Institut Supe ́rieur de l’Ae ́ronautique et de l’Espace). Till today, the system engineering studies have been completed and the work that will be presented will concern the algorithmic and the embedded software development. The physical architecture of the sensor relies on APS 750 developed by the CIMI laboratory of ISAE/SUPAERO. First, a star research algorithm based on the image acquired in lost-in-space mode (one of the star tracker opera- tional modes) will be presented; it is inspired by techniques of musical recognition with the help of the correlation of digital signature (hash) with those stored in databases. The musical recognition principle is based on finger- printing, i.e. the extraction of points of interest in the studied signal. In the musical context, the signal spectrogram is used to identify these points. Applying this technique in image processing domain requires an equivalent tool to spectrogram. Those points of interest create a hash and are used to efficiently search within the database pre- viously sorted in order to be compared. The main goals of this research algorithm are to minimise the number of steps in the computations in order to deliver information at a higher frequency and to increase the computation robustness against the different possible disturbances

    Biometric Systems

    Get PDF
    Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Bloom filters for molecules

    Full text link
    Ultra-large chemical libraries are reaching 10s to 100s of billions of molecules. A challenge for these libraries is to efficiently check if a proposed molecule is present. Here we propose and study Bloom filters for testing if a molecule is present in a set using either string or fingerprint representations. Bloom filters are small enough to hold billions of molecules in just a few GB of memory and check membership in sub milliseconds. We found string representations can have a false positive rate below 1% and require significantly less storage than using fingerprints. Canonical SMILES with Bloom filters with the simple FNV hashing function provide fast and accurate membership tests with small memory requirements. We provide a general implementation and specific filters for detecting if a molecule is purchasable, patented, or a natural product according to existing databases at https://github.com/whitead/molbloo

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie
    corecore