17 research outputs found

    Diffeomorphic transforms for data augmentation of highly variable shape and texture objects

    Get PDF
    [EN] Background and objective: Training a deep convolutional neural network (CNN) for automatic image classification requires a large database with images of labeled samples. However, in some applications such as biology and medicine only a few experts can correctly categorize each sample. Experts are able to identify small changes in shape and texture which go unnoticed by untrained people, as well as distinguish between objects in the same class that present drastically different shapes and textures. This means that currently available databases are too small and not suitable to train deep learning models from scratch. To deal with this problem, data augmentation techniques are commonly used to increase the dataset size. However, typical data augmentation methods introduce artifacts or apply distortions to the original image, which instead of creating new realistic samples, obtain basic spatial variations of the original ones. Methods: We propose a novel data augmentation procedure which generates new realistic samples, by combining two samples that belong to the same class. Although the idea behind the method described in this paper is to mimic the variations that diatoms experience in different stages of their life cycle, it has also been demonstrated in glomeruli and pollen identification problems. This new data augmentation procedure is based on morphing and image registration methods that perform diffeomorphic transformations. Results: The proposed technique achieves an increase in accuracy over existing techniques of 0.47%, 1.47%, and 0.23% for diatom, glomeruli and pollen problems respectively. Conclusions: For the Diatom dataset, the method is able to simulate the shape changes in different diatom life cycle stages, and thus, images generated resemble newly acquired samples with intermediate shapes. In fact, the other methods compared obtained worse results than those which were not using data augmentation. For the Glomeruli dataset, the method is able to add new samples with different shapes and degrees of sclerosis (through different textures). This is the case where our proposed DA method is more beneficial, when objects highly differ in both shape and texture. Finally, for the Pollen dataset, since there are only small variations between samples in a few classes and this dataset has other features such as noise which are likely to benefit other existing DA techniques, the method still shows an improvement of the resultsSIThe authors acknowledge financial support of the Spanish Government and Junta de Comunidades de Castilla-La Mancha under projects AQUALITAS (Ref. CTM2014-51907-C2-R-MINECO), HYPERDEEP (Ref. SBPLY/19/180501/000273), and APRENDAMOS (Ref. SBPLY/17/180501/000543). They would also like to extend the acknowledgment to technicians Enrique Cepeda and Jesus Diaz for their help in running some experiment

    False positive reduction in detection problems

    No full text
    It is important for detection systems to obtain high detection rates. However, when these systems are configured to increase the detection rate (Dr) the false positive rate (FPr) also increases. In practice, due to the high frequency of negative events, even low false positive rates produce an unacceptably high number of false positive detections. Therefore, methods for reducing this amount of false positives are needed. Since a detection system entails several steps until the decision is made, the improvement of the detector by reducing the FPr can be achieved at different stages. On the one hand, in this thesis an image classification step is applied prior to detection. We show that this step can help reduce the final FPr obtained by the detector. This pre-detection classification step is applied to mammograms classifying them according to the parenchymal densities specified in the BI-RADS (Breast Imaging Reporting and Data System). As a result, a novel hierarchical procedure based on weighted classifiers and texture features has been proposed for breast parenchymal classification. The proposed approach has been tested using 10FCV (10-fold cross-validation) and LOOCV (leave-one-out cross-validation) with the public mini-MIAS database and a large FFDM (Full Field Digital Mammography) database from local hospitals. The obtained results improve upon previously reported results. Moreover, a breast CADe (Computer Aided Detection) system has been developed to incorporate breast parenchymal density information. The results show that the proposed classification helps to adjust the parameters of the CADe algorithms and decrease the false positive rate. On the other hand, and also with the objective of reducing false positive in detector systems, this thesis focuses on the widely-used cascade detector. A sample selection method for training cascade detectors is proposed to achieve good detection/false positive ratios. The method is based on the selection of the most informative false positive samples generated in one stage. Then, these samples are used to feed the next stage. The proposed cascade framework with sample selection was compared with other cascade detectors using different databases and feature sets. The effectiveness of the method was assessed with the average partial AUC (Area Under the Curve) from the ROCs (Receiver Operating Curves) obtained with 10FCV. The results show that the proposed cascade detector with sample selection obtains on average better results

    Sample Selection for Training Cascade Detectors.

    No full text
    Automatic detection systems usually require large and representative training datasets in order to obtain good detection and false positive rates. Training datasets are such that the positive set has few samples and/or the negative set should represent anything except the object of interest. In this respect, the negative set typically contains orders of magnitude more images than the positive set. However, imbalanced training databases lead to biased classifiers. In this paper, we focus our attention on a negative sample selection method to properly balance the training data for cascade detectors. The method is based on the selection of the most informative false positive samples generated in one stage to feed the next stage. The results show that the proposed cascade detector with sample selection obtains on average better partial AUC and smaller standard deviation than the other compared cascade detectors

    Proposed cascade training algorithm.

    No full text
    <p>Proposed cascade training algorithm.</p

    pAUC results.

    No full text
    <p>Results obtained from applying the different cascade detectors over the different dataset and feature set combinations. The table shows the average pAUC values with their corresponding standard deviation <i>σ</i>. The best pAUC for each pair of database and feature set is highlighted in bold.</p><p>pAUC results.</p

    Two ROCs, A and B, with the same AUC but different pAUC.

    No full text
    <p>Two ROCs, A and B, with the same AUC but different pAUC.</p

    Base Haar-like rectangle features.

    No full text
    <p>Base Haar-like rectangle features.</p

    Proposed cascade training algorithm.

    No full text
    <p>Proposed cascade training algorithm.</p

    First-order statistics.

    No full text
    <p>First-order statistics.</p
    corecore