Search CORE

2,100 research outputs found

Dynamically adaptive partition-based interest management in distributed simulation

Author: Kumova Bora İsmail
Publication venue: 'Elsevier BV'
Publication date: 01/10/2006
Field of study

Performance and scalability of distributed simulations depends primarily on the effectiveness of the employed interest management (IM) schema that aims at reducing the overall computational and messaging effort on the shared data to a necessary minimum. Existing IM approaches, which are based on variations or combinations of two principle data distribution techniques, namely region-based and grid-based techniques, perform poorly if the simulation develops an overloaded host. In order to facilitate distributing the processing load from overloaded areas of the shared data to less loaded hosts, the partition-based technique is introduced that allows for variable-size partitioning the shared data. Based on this data distribution technique, an IM approach is sketched that is dynamically adaptive to access latencies of simulation objects on the shared data as well as to the physical location of the objects. Since this re-distribution is decided depending on the messaging effort of the simulation objects for updating data partitions, any load balanced constellation has the additional advantage to be of minimal overall messaging effort. Hence, the IM schema dynamically resolves messaging overloading as well as overloading of hosts with simulation objects and therefore facilitates dynamic system scalability

Structured Review of the Evidence for Effects of Code Duplication on Software Quality

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

University of Twente Research Information

Customer profile classification using transactional data

Author: Apeh Edward Tersoo
Gabrys Bogdan
Schierz Amanda C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2011
Field of study

Customer profiles are by definition made up of factual and transactional data. It is often the case that due to reasons such as high cost of data acquisition and/or protection, only the transactional data are available for data mining operations. Transactional data, however, tend to be highly sparse and skewed due to a large proportion of customers engaging in very few transactions. This can result in a bias in the prediction accuracy of classifiers built using them towards the larger proportion of customers with fewer transactions. This paper investigates an approach for accurately and confidently grouping and classifying customers in bins on the basis of the number of their transactions. The experiments we conducted on a highly sparse and skewed real-world transactional data show that our proposed approach can be used to identify a critical point at which customer profiles can be more confidently distinguished

Crossref

Bournemouth University Research Online

MRPR: a MapReduce solution for prototype reduction in big data classification

Author: Alpaydin
Angiulli
Bacardit
Cano
Caruana
Chang
Chen
Cover
Daniel Peralta
Dean
Dean
Derrac
Derrac
Derrac
Francisco Herrera
García
García
García-Pedrajas
García-Pedrajas
Hart
He
Isaac Triguero
Jaume Bacardit
Kohonen
Lam
Marx
Minelli
Mollineda
Nanni
Neri
Palit
Price
Pyle
Sakr
Salvador García
Snir
Srinivasan
Sánchez
Sánchez
Triguero
Triguero
Triguero
White
Wilson
Wilson
Witten
Woniak
Zhao
Publication venue: 'Elsevier BV'
Publication date: 03/03/2014
Field of study

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Repositorio Institucional Universidad de Granada