32,502 research outputs found
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
Same traits, different variance : Item-Level Variation Within Personality Measures
© 2014 the Author(s). This article has been published under the terms of the Creative Commons Attribution License. Without requesting permission from the Author or SAGE, you may further copy, distribute, transmit, and adapt the article, with the condition that the Author and SAGE Open are in each case credited as the source of the article. The version of record, Jamie S. Churcyard, Karen J. Pine, Shivani Sharma, Ben (C) Fletcher, ' Same Traits, Difference Variance: Item-Level Variation Within Personality Measures', SAGE Open, 2014, is available online via doi: 10.1177/2158244014522634Personality trait questionnaires are regularly used in individual differences research to examine personality scores between participants, although trait researchers tend to place little value on intra-individual variation in item ratings within a measured trait. The few studies that examine variability indices have not considered how they are related to a selection of psychological outcomes, so we recruited 160 participants (age M = 24.16, SD = 9.54) who completed the IPIP-HEXACO personality questionnaire and several outcome measures. Heterogenous within-subject differences in item ratings were found for every trait/facet measured, with measurement error that remained stable across the questionnaire. Within-subject standard deviations, calculated as measures of individual variation in specific item ratings within a trait/facet, were related to outcomes including life satisfaction and depression. This suggests these indices represent valid constructs of variability, and that researchers administering behavior statement trait questionnaires with outcome measures should also apply item-level variability indices.Peer reviewedFinal Published versio
From Quantity to Quality: Massive Molecular Dynamics Simulation of Nanostructures under Plastic Deformation in Desktop and Service Grid Distributed Computing Infrastructure
The distributed computing infrastructure (DCI) on the basis of BOINC and
EDGeS-bridge technologies for high-performance distributed computing is used
for porting the sequential molecular dynamics (MD) application to its parallel
version for DCI with Desktop Grids (DGs) and Service Grids (SGs). The actual
metrics of the working DG-SG DCI were measured, and the normal distribution of
host performances, and signs of log-normal distributions of other
characteristics (CPUs, RAM, and HDD per host) were found. The practical
feasibility and high efficiency of the MD simulations on the basis of DG-SG DCI
were demonstrated during the experiment with the massive MD simulations for the
large quantity of aluminum nanocrystals (-). Statistical
analysis (Kolmogorov-Smirnov test, moment analysis, and bootstrapping analysis)
of the defect density distribution over the ensemble of nanocrystals had shown
that change of plastic deformation mode is followed by the qualitative change
of defect density distribution type over ensemble of nanocrystals. Some
limitations (fluctuating performance, unpredictable availability of resources,
etc.) of the typical DG-SG DCI were outlined, and some advantages (high
efficiency, high speedup, and low cost) were demonstrated. Deploying on DG DCI
allows to get new scientific from the simulated
of numerous configurations by harnessing sufficient computational power to
undertake MD simulations in a wider range of physical parameters
(configurations) in a much shorter timeframe.Comment: 13 pages, 11 pages (http://journals.agh.edu.pl/csci/article/view/106
Human gesture classification by brute-force machine learning for exergaming in physiotherapy
In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
- …