Search CORE

14,073 research outputs found

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Data mining based cyber-attack detection

Author: Tianfield Huaglory
Publication venue
Publication date: 31/05/2017
Field of study

ResearchOnline@GCU

RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

Author: Alexandru Pîrjan
Ion Lungu
Publication venue
Publication date
Field of study

In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

Research Papers in Economics

A Fast Minimal Infrequent Itemset Mining Algorithm

Author: Demchuk Kostyantyn
Leith Douglas J.
Publication venue
Publication date: 01/01/2014
Field of study

A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records

arXiv.org e-Print Archive

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

Extension of CART using multiple splits under order restrictions

Author: Salanti G.
Strobl R.
Ulm Kurt
Publication venue
Publication date: 01/01/2003
Field of study

CART was introduced by Breiman et al. (1984) as a classification tool. It divides the whole sample recursively in two subpopulations by finding the best possible split with respect to a optimisation criterion. This method, restricted up to date to binary splits, is extended in this paper for allowing also multiple splits. The main problem with this extension is related to the optimal number of splits and the location of the corresponding cutpoints. In order to reduce the computational effort and enhance parsimony, the reduced isotonic regression was used in order to solve this problem. The extended CART method was tested in a simulation study and was compared with the classical approach in an epidemiological study. In both studies the extended CART turned out to be a useful and reliable alternative

Open Access LMU