88,302 research outputs found

    Interleaved Deep Artifacts-Aware Attention Mechanism for Concrete Structural Defect Classification.

    Get PDF
    Automatic machine classification of concrete structural defects in images poses significant challenges because of multitude of problems arising from the surface texture, such as presence of stains, holes, colors, poster remains, graffiti, marking and painting, along with uncontrolled weather conditions and illuminations. In this paper, we propose an interleaved deep artifacts-aware attention mechanism (iDAAM) to classify multi-target multi-class and single-class defects from structural defect images. Our novel architecture is composed of interleaved fine-grained dense modules (FGDM) and concurrent dual attention modules (CDAM) to extract local discriminative features from concrete defect images. FGDM helps to aggregate multi-layer robust information with wide range of scales to describe visually-similar overlapping defects. On the other hand, CDAM selects multiple representations of highly localized overlapping defect features and encodes the crucial spatial regions from discriminative channels to address variations in texture, viewing angle, shape and size of overlapping defect classes. Within iDAAM, FGDM and CDAM are interleaved to extract salient discriminative features from multiple scales by constructing an end-to-end trainable network without any preprocessing steps, making the process fully automatic. Experimental results and extensive ablation studies on three publicly available large concrete defect datasets show that our proposed approach outperforms the current state-of-the-art methodologies

    UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition

    Full text link
    Advances in image restoration and enhancement techniques have led to discussion about how such algorithmscan be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of the computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG^2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 160,000 annotated frames forhundreds of ImageNet classes are available, which are used for baseline experiments that assess the impact of known and unknown image artifacts and other conditions on common deep learning-based object classification approaches. Further, current image restoration and enhancement techniques are evaluated by determining whether or not theyimprove baseline classification performance. Results showthat there is plenty of room for algorithmic innovation, making this dataset a useful tool going forward.Comment: Supplemental material: https://goo.gl/vVM1xe, Dataset: https://goo.gl/AjA6En, CVPR 2018 Prize Challenge: ug2challenge.or

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance
    • …
    corecore