18,152 research outputs found

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    Toward transferable interatomic van der Waals interactions without electrons: The role of multipole electrostatics and many-body dispersion

    Get PDF
    We estimate polarizabilities of atoms in molecules without electron density, using a Voronoi tesselation approach instead of conventional density partitioning schemes. The resulting atomic dispersion coefficients are calculated, as well as many-body dispersion effects on intermolecular potential energies. We also estimate contributions from multipole electrostatics and compare them to dispersion. We assess the performance of the resulting intermolecular interaction model from dispersion and electrostatics for more than 1,300 neutral and charged, small organic molecular dimers. Applications to water clusters, the benzene crystal, the anti-cancer drug ellipticine---intercalated between two Watson-Crick DNA base pairs, as well as six macro-molecular host-guest complexes highlight the potential of this method and help to identify points of future improvement. The mean absolute error made by the combination of static electrostatics with many-body dispersion reduces at larger distances, while it plateaus for two-body dispersion, in conflict with the common assumption that the simple 1/R61/R^6 correction will yield proper dissociative tails. Overall, the method achieves an accuracy well within conventional molecular force fields while exhibiting a simple parametrization protocol.Comment: 13 pages, 8 figure

    Algorithms for Extracting Frequent Episodes in the Process of Temporal Data Mining

    Get PDF
    An important aspect in the data mining process is the discovery of patterns having a great influence on the studied problem. The purpose of this paper is to study the frequent episodes data mining through the use of parallel pattern discovery algorithms. Parallel pattern discovery algorithms offer better performance and scalability, so they are of a great interest for the data mining research community. In the following, there will be highlighted some parallel and distributed frequent pattern mining algorithms on various platforms and it will also be presented a comparative study of their main features. The study takes into account the new possibilities that arise along with the emerging novel Compute Unified Device Architecture from the latest generation of graphics processing units. Based on their high performance, low cost and the increasing number of features offered, GPU processors are viable solutions for an optimal implementation of frequent pattern mining algorithmsFrequent Pattern Mining, Parallel Computing, Dynamic Load Balancing, Temporal Data Mining, CUDA, GPU, Fermi, Thread

    Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

    Full text link
    Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
    corecore