162,501 research outputs found
Psychometrics in Practice at RCEC
A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud
All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment
Input Prioritization for Testing Neural Networks
Deep neural networks (DNNs) are increasingly being adopted for sensing and
control functions in a variety of safety and mission-critical systems such as
self-driving cars, autonomous air vehicles, medical diagnostics, and industrial
robotics. Failures of such systems can lead to loss of life or property, which
necessitates stringent verification and validation for providing high
assurance. Though formal verification approaches are being investigated,
testing remains the primary technique for assessing the dependability of such
systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining
test oracle data---the expected output, a.k.a. label, for a given input---is
high, which significantly impacts the amount and quality of testing that can be
performed. Thus, prioritizing input data for testing DNNs in meaningful ways to
reduce the cost of labeling can go a long way in increasing testing efficacy.
This paper proposes using gauges of the DNN's sentiment derived from the
computation performed by the model, as a means to identify inputs that are
likely to reveal weaknesses. We empirically assessed the efficacy of three such
sentiment measures for prioritization---confidence, uncertainty, and
surprise---and compare their effectiveness in terms of their fault-revealing
capability and retraining effectiveness. The results indicate that sentiment
measures can effectively flag inputs that expose unacceptable DNN behavior. For
MNIST models, the average percentage of inputs correctly flagged ranged from
88% to 94.8%
Spatial Filtering Pipeline Evaluation of Cortically Coupled Computer Vision System for Rapid Serial Visual Presentation
Rapid Serial Visual Presentation (RSVP) is a paradigm that supports the
application of cortically coupled computer vision to rapid image search. In
RSVP, images are presented to participants in a rapid serial sequence which can
evoke Event-related Potentials (ERPs) detectable in their Electroencephalogram
(EEG). The contemporary approach to this problem involves supervised spatial
filtering techniques which are applied for the purposes of enhancing the
discriminative information in the EEG data. In this paper we make two primary
contributions to that field: 1) We propose a novel spatial filtering method
which we call the Multiple Time Window LDA Beamformer (MTWLB) method; 2) we
provide a comprehensive comparison of nine spatial filtering pipelines using
three spatial filtering schemes namely, MTWLB, xDAWN, Common Spatial Pattern
(CSP) and three linear classification methods Linear Discriminant Analysis
(LDA), Bayesian Linear Regression (BLR) and Logistic Regression (LR). Three
pipelines without spatial filtering are used as baseline comparison. The Area
Under Curve (AUC) is used as an evaluation metric in this paper. The results
reveal that MTWLB and xDAWN spatial filtering techniques enhance the
classification performance of the pipeline but CSP does not. The results also
support the conclusion that LR can be effective for RSVP based BCI if
discriminative features are available
Systems design analysis applied to launch vehicle configuration
As emphasis shifts from optimum-performance aerospace systems to least lift-cycle costs, systems designs must seek, adapt, and innovate cost improvement techniques in design through operations. The systems design process of concept, definition, and design was assessed for the types and flow of total quality management techniques that may be applicable in a launch vehicle systems design analysis. Techniques discussed are task ordering, quality leverage, concurrent engineering, Pareto's principle, robustness, quality function deployment, criteria, and others. These cost oriented techniques are as applicable to aerospace systems design analysis as to any large commercial system
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
- …