16 research outputs found
Performance-Efficiency Comparisons of Channel Attention Modules for ResNets
Attention modules can be added to neural network architectures to improve performance. This work presents an extensive comparison between several efficient attention modules for image classification and object detection, in addition to proposing a novel Attention Bias module with lower computational overhead. All measured attention modules have been efficiently re-implemented, which allows an objective comparison and evaluation of the relationship between accuracy and inference time. Our measurements show that single-image inference time increases far more (5–50%) than the increase in FLOPs suggests (0.2–3%) for a limited gain in accuracy, making computation cost an important selection criterion. Despite this increase in inference time, adding an attention module can outperform a deeper baseline ResNet in both speed and accuracy. Finally, we investigate the potential of adding attention modules to pretrained networks and show that fine-tuning is possible and superior to training from scratch. The choice of the best attention module strongly depends on the specific ResNet architecture, input resolution, batch size and inference framework.</p
Tracklet-based vessel re-identification for multicamera vessel-speed enforcement
In crowded waterways, maritime traffic is bound to speed regulations for safety reasons. Although several speed measurement techniques exist for road traffic, such systems are not available for vessels. This paper proposes a new approach for tracklet-based re-identification (re-ID) as a solution for vessel-speed enforcement. For evaluation, the Vessel-reID dataset is used that we introduced in previous work [2]. The core of the tracklet re-ID approach is based on a novelTracklet-based Querying Procedure as a more effective alternative to the Common Querying Procedure (CQP) found in popular re-ID datasets [7, 8]. The existing procedure randomly selects a single image from the whole query-vesseltrajectory (in one camera view). This is improved by (1) detecting a set of most representative images per tracklet of a query-vessel, and by (2) raising the matching accuracy based on accumulating the gallery similarity scores for all imagesin the set. In the experimental validation, we adopt two well-known person reID algorithms, TriNet [3] and MGN [6], since most re-ID literature focuses on person re-ID. Results show a significant increase in performance by applying thetracklet-based approach instead of CQP: a gain of 5.6% and 8.1% Rank-1 for MGN and TriNet, respectively
Tracklet-based vessel re-identification for multicamera vessel-speed enforcement
In crowded waterways, maritime traffic is bound to speed regulations for safety reasons. Although several speed measurement techniques exist for road traffic, such systems are not available for vessels. This paper proposes a new approach for tracklet-based re-identification (re-ID) as a solution for vessel-speed enforcement. For evaluation, the Vessel-reID dataset is used that we introduced in previous work [2]. The core of the tracklet re-ID approach is based on a novelTracklet-based Querying Procedure as a more effective alternative to the Common Querying Procedure (CQP) found in popular re-ID datasets [7, 8]. The existing procedure randomly selects a single image from the whole query-vesseltrajectory (in one camera view). This is improved by (1) detecting a set of most representative images per tracklet of a query-vessel, and by (2) raising the matching accuracy based on accumulating the gallery similarity scores for all imagesin the set. In the experimental validation, we adopt two well-known person reID algorithms, TriNet [3] and MGN [6], since most re-ID literature focuses on person re-ID. Results show a significant increase in performance by applying thetracklet-based approach instead of CQP: a gain of 5.6% and 8.1% Rank-1 for MGN and TriNet, respectively
Hierarchical Object Detection and Classification Using SSD Multi-Loss
When merging existing similar datasets, it would be attractive to benefit from a higher detection rate of objects and the additional partial ground-truth samples for improving object classification. To this end, a novel CNN detector with a hierarchical binary classification system is proposed. The detector is based on the Single-Shot multibox Detector (SSD) and inspired by the hierarchical classification used in the YOLO9000 detector. Localization and classification are separated during training, by introducing a novel loss term that handles hierarchical classification in the loss function (SSD-ML). We experiment with the proposed SSD-ML detector on the generic PASCAL VOC dataset and show that additional super-categories can be learned with minimal impact on the overall accuracy. Furthermore, we find that not all objects are required to have classification label information as classification performance only drops from 73.3 % to 70.6 % while 60 % of the label information is removed. The flexibility of the detector with respect to the different levels of details in label definitions is investigated for a traffic surveillance application, involving public and proprietary datasets with non-overlapping class definitions. Including classification label information from our dataset raises the performance significantly from 70.7 % to 82.2 %. The experiments show that the desired hierarchical labels can be learned from the public datasets, while only using box information from our dataset. In general, this shows that it is possible to combine existing datasets with similar object classes and partial annotations and benefit in terms of growth of detection rate and improved class categorization performance.</p
AnonImMed:An Open-Source Tool for Fast Anonymization of Medical Images
In the past few years, several datasets that include privacy-sensitive information have been taken offline due to privacyconcerns and new laws such as the GDPR regulation inEurope. Medical images contain patient information that isregulated even more strictly. Deep learning has become thestandard for Computer Aided Diagnosis (CAD) based onmedical images and requires large amounts of data to achievegood performance. The current standard in medical imagingis to print patient text directly over the measurement image,sometimes even partly occluding tissue pixels. This privacy-sensitive patient data is not required for general medicalimaging research. Thus, anonymizing these medical imagesenables researchers to use this data for research purposes.This paper describes a method to automatically detect andremove text from medical images at high processing speeds.We base our method on the EAST text detector [1] and makethe following four contributions: (1) open-source implementation of the anonymization tool; (2) method to generate largeamounts of synthetic training text; (3) multiple optimizationsto improve the processing speed of anonymizatio
Gender classification in low-resolution surveillance video:In-depth comparison of random forests and SVMs
This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intraclass variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9±0.2%, while classification computation time is negligible compared to the detection time of pedestrians.</p
Gender classification in low-resolution surveillance video:In-depth comparison of random forests and SVMs
This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intraclass variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9±0.2%, while classification computation time is negligible compared to the detection time of pedestrians.</p
Rare-class extraction using cascaded pretrained networks applied to crane classification
Overweight vehicles are a common source of pavement and bridge damage. Especially mobile crane vehicles are often beyond legal per-axle weight limits, carrying their lifting blocks and ballast on the vehicle instead of on a separate trailer. To prevent road deterioration, the detection of overweight cranes is desirable for law enforcement. As the source of crane weight is visible, we propose a camera-based detection system based on convolutional neural networks. We iteratively label our dataset to vastly reduce labeling and extensively investigate the impact of image resolution, network depth and dataset size to choose optimal parameters during iterative labeling. We show that iterative labeling with intelligently chosen image resolutions and network depths can vastly improve (up to 70Ă—) the speed at which data can be labeled, to train classification systems for practical surveillance applications. The experiments provide an estimate of the optimal amount of data required to train an effective classification system, which is valuable for classification problems in general. The proposed system achieves an AUC score of 0.985 for distinguishing cranes from other vehicles and an AUC of 0.92 and 0.77 on lifting block and ballast classification, respectively. The proposed classification system enables effective road monitoring for semi-automatic law enforcement and is attractive for rare-class extraction in general surveillance classification problems