Search CORE

25 research outputs found

Recommended from our members

Context-aware visual exploration of molecular databases

Author: Berthold Michael
Di Fatta Giuseppe
Fiannaca Antonino
Gaglio Salvatore
Rizzo Riccardo
Urso Alfonso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Facilitating the visual exploration of scientific data has received increasing attention in the past decade or so. Especially in life science related application areas the amount of available data has grown at a breath taking pace. In this paper we describe an approach that allows for visual inspection of large collections of molecular compounds. In contrast to classical visualizations of such spaces we incorporate a specific focus of analysis, for example the outcome of a biological experiment such as high throughout screening results. The presented method uses this experimental data to select molecular fragments of the underlying molecules that have interesting properties and uses the resulting space to generate a two dimensional map based on a singular value decomposition algorithm and a self organizing map. Experiments on real datasets show that the resulting visual landscape groups molecules of similar chemical properties in densely connected regions

Central Archive at the University of Reading

CiteSeerX

Archivio istituzionale della ricerca - Università di Palermo

High performance subgraph mining in molecular compounds

Author: M.J. Zaki
O. Weislow
R. Finkel
T. Washio
Y. Chung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Efficient mining of discriminative molecular fragments

Author: Berthold Michael R.
Di Fatta Giuseppe
Publication venue
Publication date: 01/01/2005
Field of study

Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Inductive queries for a drug designing robot scientist

Author: A. Lingas
C. Hansch
C.A. Lipinski
D.R. Jones
D.R. Jones
H. Blockeel
J. Matousek
L. Raedt De
R.D. King
R.D. King
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

Lirias

Crossref

Bournemouth University Research Online

The University of Manchester - Institutional Repository

DIAL UCLouvain

A Linear Programming Approach for Molecular QSAR analysis

Author: Kadowaki T.
Saigo H.
Tsuda K.
Publication venue
Publication date: 01/01/2009
Field of study

Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing

MPG.PuRe

LogCHEM: interactive discriminative mining of chemical structure

Author: Nuno A. Fonseca
Rui Camacho
Vítor Santos Costa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

One of the most well known successes of Inductive Logic Programming (ILP) is on Structure-Activity Relationship (SAR) problems. In such problems, ILP has proved several times to be capable of constructing expert comprehensible models that hell) to explain the activity of chemical compounds based on their structure and properties. However, despite its successes on SAR problems, ILP has severe scalability problems that prevent its application oil larger datasets. In this paper we present LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. LogCHEM tackles ILP's scalability issues in the context of SAR applications. We show that LogCHEM benefits from the flexibility of ILP both by its ability to quickly extend the original mining model, and by its ability, to interface with external tools. Furthermore, We demonstrate that LogCHEM can be used to mine effectively large chemoinformatics datasets, namely, several datasets from EPA's DSSTox database and on a dataset based on the DTP AIDS anti-viral screen

Crossref

Repositório Aberto da Universidade do Porto

FP-Growth Based New Normalization Technique for Subgraph Ranking

Author
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date
Field of study

Crossref

The development of a knowledge base for basic active structures: an example case of dopamine agonists

Author: A Böcker
A Böcker
A Böcker
A Inokuchi
A Inokuchi
A Inokuchi
A Karwath
CL Russom
CW Gini
G Harper
G Harper
G Klopman
G Klopman
G Klopman
H Ohtaka
Hiroshi Horikawa
JL Medina-Franco
K Tsunoyama
Masumi Yamakawa
Norihito Ohmori
RD King
S Fujishima
S Kramer
Sachio Mori
Satoshi Fujishima
T Okada
T Okada
T Okada
T Okada
T Okada
T Okada
Takashi Okada
Taketo Hayashi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs. Results This paper presents a BAS knowledge base, BASiC, which currently covers 46 activities and is available on the Internet. We use the dopamine agonists D1, D2, and Dauto as examples and illustrate the process of BAS extraction. The resulting BASs were reasonably interpreted after proposing a few template structures. Conclusions The knowledge base is useful for drug design. Proposed BASs and their supporting structures in the knowledge base will facilitate the development of new template structures for other activities, and will be useful in the design of new lead compounds via reasonable interpretations of active structures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DESY