88,302 research outputs found
Interleaved Deep Artifacts-Aware Attention Mechanism for Concrete Structural Defect Classification.
Automatic machine classification of concrete structural defects in images poses significant challenges because of multitude of problems arising from the surface texture, such as presence of stains, holes, colors, poster remains, graffiti, marking and painting, along with uncontrolled weather conditions and illuminations. In this paper, we propose an interleaved deep artifacts-aware attention mechanism (iDAAM) to classify multi-target multi-class and single-class defects from structural defect images. Our novel architecture is composed of interleaved fine-grained dense modules (FGDM) and concurrent dual attention modules (CDAM) to extract local discriminative features from concrete defect images. FGDM helps to aggregate multi-layer robust information with wide range of scales to describe visually-similar overlapping defects. On the other hand, CDAM selects multiple representations of highly localized overlapping defect features and encodes the crucial spatial regions from discriminative channels to address variations in texture, viewing angle, shape and size of overlapping defect classes. Within iDAAM, FGDM and CDAM are interleaved to extract salient discriminative features from multiple scales by constructing an end-to-end trainable network without any preprocessing steps, making the process fully automatic. Experimental results and extensive ablation studies on three publicly available large concrete defect datasets show that our proposed approach outperforms the current state-of-the-art methodologies
UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition
Advances in image restoration and enhancement techniques have led to
discussion about how such algorithmscan be applied as a pre-processing step to
improve automatic visual recognition. In principle, techniques like deblurring
and super-resolution should yield improvements by de-emphasizing noise and
increasing signal in an input image. But the historically divergent goals of
the computational photography and visual recognition communities have created a
significant need for more work in this direction. To facilitate new research,
we introduce a new benchmark dataset called UG^2, which contains three
difficult real-world scenarios: uncontrolled videos taken by UAVs and manned
gliders, as well as controlled videos taken on the ground. Over 160,000
annotated frames forhundreds of ImageNet classes are available, which are used
for baseline experiments that assess the impact of known and unknown image
artifacts and other conditions on common deep learning-based object
classification approaches. Further, current image restoration and enhancement
techniques are evaluated by determining whether or not theyimprove baseline
classification performance. Results showthat there is plenty of room for
algorithmic innovation, making this dataset a useful tool going forward.Comment: Supplemental material: https://goo.gl/vVM1xe, Dataset:
https://goo.gl/AjA6En, CVPR 2018 Prize Challenge: ug2challenge.or
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
- …