7,836 research outputs found
A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification
Crowdsourcing has become widely used in supervised scenarios where training
sets are scarce and difficult to obtain. Most crowdsourcing models in the
literature assume labelers can provide answers to full questions. In
classification contexts, full questions require a labeler to discern among all
possible classes. Unfortunately, discernment is not always easy in realistic
scenarios. Labelers may not be experts in differentiating all classes. In this
work, we provide a full probabilistic model for a shorter type of queries. Our
shorter queries only require "yes" or "no" responses. Our model estimates a
joint posterior distribution of matrices related to labelers' confusions and
the posterior probability of the class of every object. We developed an
approximate inference approach, using Monte Carlo Sampling and Black Box
Variational Inference, which provides the derivation of the necessary
gradients. We built two realistic crowdsourcing scenarios to test our model.
The first scenario queries for irregular astronomical time-series. The second
scenario relies on the image classification of animals. We achieved results
that are comparable with those of full query crowdsourcing. Furthermore, we
show that modeling labelers' failures plays an important role in estimating
true classes. Finally, we provide the community with two real datasets obtained
from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official
pages, 5 supplementary page
ModHMM: A Modular Supra-Bayesian Genome Segmentation Method
Genome segmentation methods are powerful tools to obtain cell type or tissue-specific genome-wide annotations and are frequently used to discover regulatory elements. However, traditional segmentation methods show low predictive accuracy and their data-driven annotations have some undesirable properties. As an alternative, we developed ModHMM, a highly modular genome segmentation method. Inspired by the supra-Bayesian approach, it incorporates predictions from a set of classifiers. This allows to compute genome segmentations by utilizing state-of-the-art methodology. We demonstrate the method on ENCODE data and show that it outperforms traditional segmentation methods not only in terms of predictive performance, but also in qualitative aspects. Therefore, ModHMM is a valuable alternative to study the epigenetic and regulatory landscape across and within cell types or tissues
Input Prioritization for Testing Neural Networks
Deep neural networks (DNNs) are increasingly being adopted for sensing and
control functions in a variety of safety and mission-critical systems such as
self-driving cars, autonomous air vehicles, medical diagnostics, and industrial
robotics. Failures of such systems can lead to loss of life or property, which
necessitates stringent verification and validation for providing high
assurance. Though formal verification approaches are being investigated,
testing remains the primary technique for assessing the dependability of such
systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining
test oracle data---the expected output, a.k.a. label, for a given input---is
high, which significantly impacts the amount and quality of testing that can be
performed. Thus, prioritizing input data for testing DNNs in meaningful ways to
reduce the cost of labeling can go a long way in increasing testing efficacy.
This paper proposes using gauges of the DNN's sentiment derived from the
computation performed by the model, as a means to identify inputs that are
likely to reveal weaknesses. We empirically assessed the efficacy of three such
sentiment measures for prioritization---confidence, uncertainty, and
surprise---and compare their effectiveness in terms of their fault-revealing
capability and retraining effectiveness. The results indicate that sentiment
measures can effectively flag inputs that expose unacceptable DNN behavior. For
MNIST models, the average percentage of inputs correctly flagged ranged from
88% to 94.8%
- …