32,502 research outputs found

    Early Accurate Results for Advanced Analytics on MapReduce

    Full text link
    Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to minimize the changes required to the MapReduce framework. Various tests of EARL of Hadoop are presented to characterize the frequent situations where EARL can provide major speed-ups over the current version of Hadoop.Comment: VLDB201

    Same traits, different variance : Item-Level Variation Within Personality Measures

    Get PDF
    © 2014 the Author(s). This article has been published under the terms of the Creative Commons Attribution License. Without requesting permission from the Author or SAGE, you may further copy, distribute, transmit, and adapt the article, with the condition that the Author and SAGE Open are in each case credited as the source of the article. The version of record, Jamie S. Churcyard, Karen J. Pine, Shivani Sharma, Ben (C) Fletcher, ' Same Traits, Difference Variance: Item-Level Variation Within Personality Measures', SAGE Open, 2014, is available online via doi: 10.1177/2158244014522634Personality trait questionnaires are regularly used in individual differences research to examine personality scores between participants, although trait researchers tend to place little value on intra-individual variation in item ratings within a measured trait. The few studies that examine variability indices have not considered how they are related to a selection of psychological outcomes, so we recruited 160 participants (age M = 24.16, SD = 9.54) who completed the IPIP-HEXACO personality questionnaire and several outcome measures. Heterogenous within-subject differences in item ratings were found for every trait/facet measured, with measurement error that remained stable across the questionnaire. Within-subject standard deviations, calculated as measures of individual variation in specific item ratings within a trait/facet, were related to outcomes including life satisfaction and depression. This suggests these indices represent valid constructs of variability, and that researchers administering behavior statement trait questionnaires with outcome measures should also apply item-level variability indices.Peer reviewedFinal Published versio

    From Quantity to Quality: Massive Molecular Dynamics Simulation of Nanostructures under Plastic Deformation in Desktop and Service Grid Distributed Computing Infrastructure

    Get PDF
    The distributed computing infrastructure (DCI) on the basis of BOINC and EDGeS-bridge technologies for high-performance distributed computing is used for porting the sequential molecular dynamics (MD) application to its parallel version for DCI with Desktop Grids (DGs) and Service Grids (SGs). The actual metrics of the working DG-SG DCI were measured, and the normal distribution of host performances, and signs of log-normal distributions of other characteristics (CPUs, RAM, and HDD per host) were found. The practical feasibility and high efficiency of the MD simulations on the basis of DG-SG DCI were demonstrated during the experiment with the massive MD simulations for the large quantity of aluminum nanocrystals (∼102\sim10^2-10310^3). Statistical analysis (Kolmogorov-Smirnov test, moment analysis, and bootstrapping analysis) of the defect density distribution over the ensemble of nanocrystals had shown that change of plastic deformation mode is followed by the qualitative change of defect density distribution type over ensemble of nanocrystals. Some limitations (fluctuating performance, unpredictable availability of resources, etc.) of the typical DG-SG DCI were outlined, and some advantages (high efficiency, high speedup, and low cost) were demonstrated. Deploying on DG DCI allows to get new scientific quality\it{quality} from the simulated quantity\it{quantity} of numerous configurations by harnessing sufficient computational power to undertake MD simulations in a wider range of physical parameters (configurations) in a much shorter timeframe.Comment: 13 pages, 11 pages (http://journals.agh.edu.pl/csci/article/view/106

    Human gesture classification by brute-force machine learning for exergaming in physiotherapy

    Get PDF
    In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
    • …
    corecore