9 research outputs found

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    HSMA_WOA: A hybrid novel Slime mould algorithm with whale optimization algorithm for tackling the image segmentation problem of chest X-ray images

    Get PDF
    Recently, a novel virus called COVID-19 has pervasive worldwide, starting from China and moving to all the world to eliminate a lot of persons. Many attempts have been experimented to identify the infection with COVID-19. The X-ray images were one of the attempts to detect the influence of COVID-19 on the infected persons from involving those experiments. According to the X-ray analysis, bilateral pulmonary parenchymal ground-glass and consolidative pulmonary opacities can be caused by COVID-19 — sometimes with a rounded morphology and a peripheral lung distribution. But unfortunately, the specification or if the person infected with COVID-19 or not is so hard under the X-ray images. X-ray images could be classified using the machine learning techniques to specify if the person infected severely, mild, or not infected. To improve the classification accuracy of the machine learning, the region of interest within the image that contains the features of COVID-19 must be extracted. This problem is called the image segmentation problem (ISP). Many techniques have been proposed to overcome ISP. The most commonly used technique due to its simplicity, speed, and accuracy are threshold-based segmentation. This paper proposes a new hybrid approach based on the thresholding technique to overcome ISP for COVID-19 chest X-ray images by integrating a novel meta-heuristic algorithm known as a slime mold algorithm (SMA) with the whale optimization algorithm to maximize the Kapur's entropy. The performance of integrated SMA has been evaluated on 12 chest X-ray images with threshold levels up to 30 and compared with five algorithms: Lshade algorithm, whale optimization algorithm (WOA), FireFly algorithm (FFA), Harris-hawks algorithm (HHA), salp swarm algorithms (SSA), and the standard SMA. The experimental results demonstrate that the proposed algorithm outperforms SMA under Kapur's entropy for all the metrics used and the standard SMA could perform better than the other algorithms in the comparison under all the metrics

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Datasets and Models for Historical Newspaper Article Segmentation

    No full text
    Dataset and models used and produced in the work described in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers": https://infoscience.epfl.ch/record/282863?ln=e

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    No full text
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    No full text
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. Although the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of more fine-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. We introduce a multimodal neural model for the semantic segmentation of historical newspapers that directly combines visual features at pixel level with text embedding maps derived from, potentially noisy, OCR output. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to the wide variety of our material

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    No full text
    corecore