Search CORE

211 research outputs found

Puddles

Author: Kegelmeyer Kathy
Publication venue: 'The Ohio State University Libraries'
Publication date: 01/09/1977
Field of study

OSU Libraries Digital Journal Publishing (Ohio State University)

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

ASCI visualization tool evaluation, Version 2.0

Author: Kegelmeyer P.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/04/1997
Field of study

The charter of the ASCI Visualization Common Tools subgroup was to investigate and evaluate 3D scientific visualization tools. As part of that effort, a Tri-Lab evaluation effort was launched in February of 1996. The first step was to agree on a thoroughly documented list of 32 features against which all tool candidates would be evaluated. These evaluation criteria were both gleaned from a user survey and determined from informed extrapolation into the future, particularly as concerns the 3D nature and extremely large size of ASCI data sets. The second step was to winnow a field of 41 candidate tools down to 11. The selection principle was to be as inclusive as practical, retaining every tool that seemed to hold any promise of fulfilling all of ASCI`s visualization needs. These 11 tools were then closely investigated by volunteer evaluators distributed across LANL, LLNL, and SNL. This report contains the results of those evaluations, as well as a discussion of the evaluation philosophy and criteria

UNT Digital Library

Recommended from our members

Extraction of cloud statistics from whole sky imaging cameras

Author: Kegelmeyer W. P. Jr.
Publication venue: Sandia National Laboratories
Publication date: 01/03/1994
Field of study

Computer codes have been developed to extract basic cloud statistics from whole sky imaging (WSI) cameras. This report documents, on an algorithmic level, the steps and processes underlying these codes. Appendices comment on code details and on how to adapt to future changes in either the source camera or the host computer

UNT Digital Library

SMOTE: Synthetic Minority Over-sampling Technique

Author: Bowyer K. W.
Chawla N. V.
Hall L. O.
Kegelmeyer W. P.
Publication venue: 'AI Access Foundation'
Publication date: 09/06/2011
Field of study

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy

arXiv.org e-Print Archive

Crossref

H11Implementing physiotherapy Huntington's disease guidelines in clinical practice: a global survey

Author: Jones Una
Kegelmeyer Deb
Kloos Anne
Publication venue: 'BMJ'
Publication date: 30/09/2021
Field of study

Background Clinical practice guidelines are often not optimally translated to clinical care. Following the publication of the Huntington’s disease (HD) physiotherapy clinical practice guidelines in 2020, the European Huntington’s Disease Network Physiotherapy Working Group (EHDN PWG) identified a need to explore perceived facilitators and barriers to their implementation. The aims of this study were to explore physiotherapists’ awareness of and perceived barriers and facilitators to implementation of the 2020 guidelines. Methods An observational study was carried out using an online survey. Participants were physiotherapists recruited via the EHDN and physiotherapy associations in the United Kingdom, Australia, and United States of America. The survey gathered data on agreement and disagreement with statements of barriers and facilitators to implementation of each of six recommendations in the guidelines using Likert scales. Results There were 32 respondents: 18 from Europe, 7 from Australia, 5 from the USA, 1 from Africa (1 missing data). The majority were aware of the guidelines (69%), with 75% working with clients with HD < 40% of their time. Key findings were that HD specific attributes (physical, behavioural and low motivation) were perceived to be barriers to implementation of recommendations ( ≥ 70% agreement). Support from colleagues (81-91% agreement), an individualised plan (72-88% agreement) and physiotherapists’ expertise in HD (81-91% agreement) were found to be facilitators of implementation in all six of the recommendations. Conclusions This study is the first to explore implementation of guidelines in physiotherapy clinical practice. Resources from PWG to support physiotherapists need to focus on ways to implement recommendations specifically related to management of physical, behavioural and motivational problems associated with HD. This would enhance physiotherapists’ expertise, a facilitator to implementation of clinical practice guidelines

Online Research @ Cardiff

Recommended from our members

One user's report on Sandia data objects : evaluation of the DOL and PMO for use in feature characterization.

Author: Kegelmeyer W. Philip, Jr.
Koegler Wendy S.
Publication venue: Sandia National Laboratories
Publication date: 01/11/2003
Field of study

The Feature Characterization project (FCDMF) has the goal of building tools that can extract and analyze coherent features in a terabyte dataset. We desire to extend our feature characterization library (FClib) to support a wider variety of complex ASCI data, and to support parallel algorithms. An attractive alternative to extending the library's internal data structures is to replace them with an externally provided data object. This report is the summary of a quick exploration of two candidate data objects in use at Sandia National Laboratories: the Data Object Library (DOL) and the Parallel Mesh Object (PMO). It is our hope that this report will provide information for potential users of the data objects, as well as feedback for the objects developers. The data objects were evaluated as to whether they (1) supported the same capabilities as the current version of FClib, (2) provided additional required capabilities, and (3) were relatively easy to use. Both data objects met the requirements of having the same capabilities as FClib and support for parallel algorithms. However, the DOL has a richer set of data structures that more closely align with the current data structures of FClib and our planned extensions. Specifically, the DOL can support time changing geometry, which is needed to represent features as datasets. Unfortunately, the DOL did not meet our ease of use requirement. The PMO was easier to learn and use, but did not support time-changing geometry. Given the above results, we will extend the FClib API (Application Programming Interface) to handle time-changing geometry. Then we will replace the internal data structures with the DOL, but we will provide the FClib API in addition to the DOL API to support simplified usage

UNT Digital Library