Search CORE

128,226 research outputs found

Study of Genetic Algorithm, an Evolutionary Approach

Author: Mrs.K.Jayavani, Dr.G.M.Kadhar Nawaz
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2014
Field of study

Data mining is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant structures, from large amount of data stored in databases, data warehouses, or other information repositories. To do this process, data mining uses a variety of algorithms according to the specifications of measures and threshold. The results of this analysis are then used to build models based on real world behavior, which are in turn used to analyze incoming data and make predictions about future behavior. Here, we are focusing on one of the efficient evolutionary algorithm called genetic algorithm. This is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. Genetic algorithms are numerical optimization algorithms inspired by both natural selection and natural genetics. This method is a general one, capable of being applied to an extremely wide range of problems. In this paper we will discuss the Genetic algorithm techniques and its application in data mining in detail

International Journal on Recent and Innovation Trends in Computing and Communication

When Social Influence Meets Item Inference

Author: Chen Ming-Syan
Huang Liang-Hao
Hung Hui-Ju
Lee Wang-Chien
Pei Jian
Shuai Hong-Han
Yang De-Nian
Publication venue
Publication date: 14/02/2016
Field of study

Research issues and data mining techniques for product recommendation and viral marketing have been widely studied. Existing works on seed selection in social networks do not take into account the effect of product recommendations in e-commerce stores. In this paper, we investigate the seed selection problem for viral marketing that considers both effects of social influence and item inference (for product recommendation). We develop a new model, Social Item Graph (SIG), that captures both effects in form of hyperedges. Accordingly, we formulate a seed selection problem, called Social Item Maximization Problem (SIMP), and prove the hardness of SIMP. We design an efficient algorithm with performance guarantee, called Hyperedge-Aware Greedy (HAG), for SIMP and develop a new index structure, called SIG-index, to accelerate the computation of diffusion process in HAG. Moreover, to construct realistic SIG models for SIMP, we develop a statistical inference based framework to learn the weights of hyperedges from data. Finally, we perform a comprehensive evaluation on our proposals with various baselines. Experimental result validates our ideas and demonstrates the effectiveness and efficiency of the proposed model and algorithms over baselines.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Inductive queries for a drug designing robot scientist

Author: A. Lingas
C. Hansch
C.A. Lipinski
D.R. Jones
D.R. Jones
H. Blockeel
J. Matousek
L. Raedt De
R.D. King
R.D. King
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

Lirias

Crossref

Bournemouth University Research Online

The University of Manchester - Institutional Repository

DIAL UCLouvain