26 research outputs found
W2WNet: A two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality
Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. Nonetheless, in many real-world scenarios, such high quality is very hard to obtain, and datasets may be affected by any sort of image degradation and mislabelling issues. This negatively impacts the performance of standard CNNs, both during the training and the inference phase. To
address this issue we propose Wise2WipedNet (W2WNet), a new two-module Convolutional Neural Network, where a Wise module exploits Bayesian inference to identify and discard spurious images during the training and a Wiped module takes care of the final classification, while broadcasting information on the prediction confidence at inference time. The goodness of our solution is demonstrated on a number of public benchmarks addressing different image classification tasks, as well as on a real-world case study on histological image analysis. Overall, our experiments demonstrate that W2WNet is able to identify image degradation and mislabelling issues both at training and at inference time, with positive impact on the final classification accurac
Random projections: data perturbation for classification problems
Random projections offer an appealing and flexible approach to a wide range
of large-scale statistical problems. They are particularly useful in
high-dimensional settings, where we have many covariates recorded for each
observation. In classification problems there are two general techniques using
random projections. The first involves many projections in an ensemble -- the
idea here is to aggregate the results after applying different random
projections, with the aim of achieving superior statistical accuracy. The
second class of methods include hashing and sketching techniques, which are
straightforward ways to reduce the complexity of a problem, perhaps therefore
with a huge computational saving, while approximately preserving the
statistical efficiency.Comment: 24 pages, 4 figure