2,100 research outputs found

    Dynamically adaptive partition-based interest management in distributed simulation

    Get PDF
    Performance and scalability of distributed simulations depends primarily on the effectiveness of the employed interest management (IM) schema that aims at reducing the overall computational and messaging effort on the shared data to a necessary minimum. Existing IM approaches, which are based on variations or combinations of two principle data distribution techniques, namely region-based and grid-based techniques, perform poorly if the simulation develops an overloaded host. In order to facilitate distributing the processing load from overloaded areas of the shared data to less loaded hosts, the partition-based technique is introduced that allows for variable-size partitioning the shared data. Based on this data distribution technique, an IM approach is sketched that is dynamically adaptive to access latencies of simulation objects on the shared data as well as to the physical location of the objects. Since this re-distribution is decided depending on the messaging effort of the simulation objects for updating data partitions, any load balanced constellation has the additional advantage to be of minimal overall messaging effort. Hence, the IM schema dynamically resolves messaging overloading as well as overloading of hosts with simulation objects and therefore facilitates dynamic system scalability

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    Customer profile classification using transactional data

    Get PDF
    Customer profiles are by definition made up of factual and transactional data. It is often the case that due to reasons such as high cost of data acquisition and/or protection, only the transactional data are available for data mining operations. Transactional data, however, tend to be highly sparse and skewed due to a large proportion of customers engaging in very few transactions. This can result in a bias in the prediction accuracy of classifiers built using them towards the larger proportion of customers with fewer transactions. This paper investigates an approach for accurately and confidently grouping and classifying customers in bins on the basis of the number of their transactions. The experiments we conducted on a highly sparse and skewed real-world transactional data show that our proposed approach can be used to identify a critical point at which customer profiles can be more confidently distinguished

    MRPR: a MapReduce solution for prototype reduction in big data classification

    Get PDF
    In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data
    • 

    corecore