3,327 research outputs found
The study of probability model for compound similarity searching
Information Retrieval or IR system main task is to retrieve relevant documents according to the users query. One of IR most popular retrieval model is the Vector Space Model. This model assumes relevance based on similarity, which is defined as the distance between query and document in the concept space. All currently existing chemical compound database systems have adapt the vector space model to calculate the similarity of a database entry to a query compound. However, it assumes that fragments represented by the bits are independent of one another, which is not necessarily true. Hence, the possibility of applying another IR model is explored, which is the Probabilistic Model, for chemical compound searching. This model estimates the probabilities of a chemical structure to have the same bioactivity as a target compound. It is envisioned that by ranking chemical structures in decreasing order of their probability of relevance to the query structure, the effectiveness of a molecular similarity searching system can be increased. Both fragment dependencies and independencies assumption are taken into consideration in achieving improvement towards compound similarity searching system. After conducting a series of simulated similarity searching, it is concluded that PM approaches really did perform better than the existing similarity searching. It gave better result in all evaluation criteria to confirm this statement. In terms of which probability model performs better, the BD model shown improvement over the BIR model
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Machine Learning of Molecular Electronic Properties in Chemical Compound Space
The combination of modern scientific computing with electronic structure
theory can lead to an unprecedented amount of data amenable to intelligent data
analysis for the identification of meaningful, novel, and predictive
structure-property relationships. Such relationships enable high-throughput
screening for relevant properties in an exponentially growing pool of virtual
compounds that are synthetically accessible. Here, we present a machine
learning (ML) model, trained on a data base of \textit{ab initio} calculation
results for thousands of organic molecules, that simultaneously predicts
multiple electronic ground- and excited-state properties. The properties
include atomization energy, polarizability, frontier orbital eigenvalues,
ionization potential, electron affinity, and excitation energies. The ML model
is based on a deep multi-task artificial neural network, exploiting underlying
correlations between various molecular properties. The input is identical to
\emph{ab initio} methods, \emph{i.e.} nuclear charges and Cartesian coordinates
of all atoms. For small organic molecules the accuracy of such a "Quantum
Machine" is similar, and sometimes superior, to modern quantum-chemical
methods---at negligible computational cost
- …