18,152 research outputs found
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
Toward transferable interatomic van der Waals interactions without electrons: The role of multipole electrostatics and many-body dispersion
We estimate polarizabilities of atoms in molecules without electron density,
using a Voronoi tesselation approach instead of conventional density
partitioning schemes. The resulting atomic dispersion coefficients are
calculated, as well as many-body dispersion effects on intermolecular potential
energies. We also estimate contributions from multipole electrostatics and
compare them to dispersion. We assess the performance of the resulting
intermolecular interaction model from dispersion and electrostatics for more
than 1,300 neutral and charged, small organic molecular dimers. Applications to
water clusters, the benzene crystal, the anti-cancer drug
ellipticine---intercalated between two Watson-Crick DNA base pairs, as well as
six macro-molecular host-guest complexes highlight the potential of this method
and help to identify points of future improvement. The mean absolute error made
by the combination of static electrostatics with many-body dispersion reduces
at larger distances, while it plateaus for two-body dispersion, in conflict
with the common assumption that the simple correction will yield proper
dissociative tails. Overall, the method achieves an accuracy well within
conventional molecular force fields while exhibiting a simple parametrization
protocol.Comment: 13 pages, 8 figure
Algorithms for Extracting Frequent Episodes in the Process of Temporal Data Mining
An important aspect in the data mining process is the discovery of patterns having a great influence on the studied problem. The purpose of this paper is to study the frequent episodes data mining through the use of parallel pattern discovery algorithms. Parallel pattern discovery algorithms offer better performance and scalability, so they are of a great interest for the data mining research community. In the following, there will be highlighted some parallel and distributed frequent pattern mining algorithms on various platforms and it will also be presented a comparative study of their main features. The study takes into account the new possibilities that arise along with the emerging novel Compute Unified Device Architecture from the latest generation of graphics processing units. Based on their high performance, low cost and the increasing number of features offered, GPU processors are viable solutions for an optimal implementation of frequent pattern mining algorithmsFrequent Pattern Mining, Parallel Computing, Dynamic Load Balancing, Temporal Data Mining, CUDA, GPU, Fermi, Thread
Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Evaluating the performance of scientific data processing systems is a
difficult task considering the plethora of application-specific solutions
available in this landscape and the lack of a generally-accepted benchmark. The
dual structure of scientific data coupled with the complex nature of processing
complicate the evaluation procedure further. SS-DB is the first attempt to
define a general benchmark for complex scientific processing over raw and
derived data. It fails to draw sufficient attention though because of the
ambiguous plain language specification and the extraordinary SciDB results. In
this paper, we remedy the shortcomings of the original SS-DB specification by
providing a formal representation in terms of ArrayQL algebra operators and
ArrayQL/SciQL constructs. These are the first formal representations of the
SS-DB benchmark. Starting from the formal representation, we give a reference
implementation and present benchmark results in EXTASCID, a novel system for
scientific data processing. EXTASCID is complete in providing native support
both for array and relational data and extensible in executing any user code
inside the system by the means of a configurable metaoperator. These features
result in an order of magnitude improvement over SciDB at data loading,
extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
- …