211 research outputs found
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
Recommended from our members
ASCI visualization tool evaluation, Version 2.0
The charter of the ASCI Visualization Common Tools subgroup was to investigate and evaluate 3D scientific visualization tools. As part of that effort, a Tri-Lab evaluation effort was launched in February of 1996. The first step was to agree on a thoroughly documented list of 32 features against which all tool candidates would be evaluated. These evaluation criteria were both gleaned from a user survey and determined from informed extrapolation into the future, particularly as concerns the 3D nature and extremely large size of ASCI data sets. The second step was to winnow a field of 41 candidate tools down to 11. The selection principle was to be as inclusive as practical, retaining every tool that seemed to hold any promise of fulfilling all of ASCI`s visualization needs. These 11 tools were then closely investigated by volunteer evaluators distributed across LANL, LLNL, and SNL. This report contains the results of those evaluations, as well as a discussion of the evaluation philosophy and criteria
Recommended from our members
Extraction of cloud statistics from whole sky imaging cameras
Computer codes have been developed to extract basic cloud statistics from whole sky imaging (WSI) cameras. This report documents, on an algorithmic level, the steps and processes underlying these codes. Appendices comment on code details and on how to adapt to future changes in either the source camera or the host computer
SMOTE: Synthetic Minority Over-sampling Technique
An approach to the construction of classifiers from imbalanced datasets is
described. A dataset is imbalanced if the classification categories are not
approximately equally represented. Often real-world data sets are predominately
composed of "normal" examples with only a small percentage of "abnormal" or
"interesting" examples. It is also the case that the cost of misclassifying an
abnormal (interesting) example as a normal example is often much higher than
the cost of the reverse error. Under-sampling of the majority (normal) class
has been proposed as a good means of increasing the sensitivity of a classifier
to the minority class. This paper shows that a combination of our method of
over-sampling the minority (abnormal) class and under-sampling the majority
(normal) class can achieve better classifier performance (in ROC space) than
only under-sampling the majority class. This paper also shows that a
combination of our method of over-sampling the minority class and
under-sampling the majority class can achieve better classifier performance (in
ROC space) than varying the loss ratios in Ripper or class priors in Naive
Bayes. Our method of over-sampling the minority class involves creating
synthetic minority class examples. Experiments are performed using C4.5, Ripper
and a Naive Bayes classifier. The method is evaluated using the area under the
Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy
H11Implementing physiotherapy Huntington's disease guidelines in clinical practice: a global survey
Background Clinical practice guidelines are often not optimally translated to clinical care. Following the publication of the Huntington’s disease (HD) physiotherapy clinical practice guidelines in 2020, the European Huntington’s Disease Network Physiotherapy Working Group (EHDN PWG) identified a need to explore perceived facilitators and barriers to their implementation. The aims of this study were to explore physiotherapists’ awareness of and perceived barriers and facilitators to implementation of the 2020 guidelines.
Methods An observational study was carried out using an online survey. Participants were physiotherapists recruited via the EHDN and physiotherapy associations in the United Kingdom, Australia, and United States of America. The survey gathered data on agreement and disagreement with statements of barriers and facilitators to implementation of each of six recommendations in the guidelines using Likert scales.
Results There were 32 respondents: 18 from Europe, 7 from Australia, 5 from the USA, 1 from Africa (1 missing data). The majority were aware of the guidelines (69%), with 75% working with clients with HD < 40% of their time. Key findings were that HD specific attributes (physical, behavioural and low motivation) were perceived to be barriers to implementation of recommendations ( ≥ 70% agreement). Support from colleagues (81-91% agreement), an individualised plan (72-88% agreement) and physiotherapists’ expertise in HD (81-91% agreement) were found to be facilitators of implementation in all six of the recommendations.
Conclusions This study is the first to explore implementation of guidelines in physiotherapy clinical practice. Resources from PWG to support physiotherapists need to focus on ways to implement recommendations specifically related to management of physical, behavioural and motivational problems associated with HD. This would enhance physiotherapists’ expertise, a facilitator to implementation of clinical practice guidelines
Recommended from our members
One user's report on Sandia data objects : evaluation of the DOL and PMO for use in feature characterization.
The Feature Characterization project (FCDMF) has the goal of building tools that can extract and analyze coherent features in a terabyte dataset. We desire to extend our feature characterization library (FClib) to support a wider variety of complex ASCI data, and to support parallel algorithms. An attractive alternative to extending the library's internal data structures is to replace them with an externally provided data object. This report is the summary of a quick exploration of two candidate data objects in use at Sandia National Laboratories: the Data Object Library (DOL) and the Parallel Mesh Object (PMO). It is our hope that this report will provide information for potential users of the data objects, as well as feedback for the objects developers. The data objects were evaluated as to whether they (1) supported the same capabilities as the current version of FClib, (2) provided additional required capabilities, and (3) were relatively easy to use. Both data objects met the requirements of having the same capabilities as FClib and support for parallel algorithms. However, the DOL has a richer set of data structures that more closely align with the current data structures of FClib and our planned extensions. Specifically, the DOL can support time changing geometry, which is needed to represent features as datasets. Unfortunately, the DOL did not meet our ease of use requirement. The PMO was easier to learn and use, but did not support time-changing geometry. Given the above results, we will extend the FClib API (Application Programming Interface) to handle time-changing geometry. Then we will replace the internal data structures with the DOL, but we will provide the FClib API in addition to the DOL API to support simplified usage
- …