181 research outputs found

    BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

    Full text link
    While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models

    Segmentation et indexation d'objets complexes dans les images de bandes dessinées

    Get PDF
    In this thesis, we review, highlight and illustrate the challenges related to comic book image analysis in order to give to the reader a good overview about the last research progress in this field and the current issues. We propose three different approaches for comic book image analysis that are composed by several processing. The first approach is called "sequential'' because the image content is described in an intuitive way, from simple to complex elements using previously extracted elements to guide further processing. Simple elements such as panel text and balloon are extracted first, followed by the balloon tail and then the comic character position in the panel. The second approach addresses independent information extraction to recover the main drawback of the first approach : error propagation. This second method is called “independent” because it is composed by several specific extractors for each elements of the image without any dependence between them. Extra processing such as balloon type classification and text recognition are also covered. The third approach introduces a knowledge-driven and scalable system of comics image understanding. This system called “expert system” is composed by an inference engine and two models, one for comics domain and another one for image processing, stored in an ontology. This expert system combines the benefits of the two first approaches and enables high level semantic description such as the reading order of panels and text, the relations between the speech balloons and their speakers and the comic character identification.Dans ce manuscrit de thèse, nous détaillons et illustrons les différents défis scientifiques liés à l'analyse automatique d'images de bandes dessinées, de manière à donner au lecteur tous les éléments concernant les dernières avancées scientifiques en la matière ainsi que les verrous scientifiques actuels. Nous proposons trois approches pour l'analyse d'image de bandes dessinées. La première approche est dite "séquentielle'' car le contenu de l'image est décrit progressivement et de manière intuitive. Dans cette approche, les extractions se succèdent, en commençant par les plus simples comme les cases, le texte et les bulles qui servent ensuite à guider l'extraction d'éléments plus complexes tels que la queue des bulles et les personnages au sein des cases. La seconde approche propose des extractions indépendantes les unes des autres de manière à éviter la propagation d'erreur due aux traitements successifs. D'autres éléments tels que la classification du type de bulle et la reconnaissance de texte y sont aussi abordés. La troisième approche introduit un système fondé sur une base de connaissance a priori du contenu des images de bandes dessinées. Ce système permet de construire une description sémantique de l'image, dirigée par les modèles de connaissances. Il combine les avantages des deux approches précédentes et permet une description sémantique de haut niveau pouvant inclure des informations telles que l'ordre de lecture, la sémantique des bulles, les relations entre les bulles et leurs locuteurs ainsi que les interactions entre les personnages

    GeoAI-enhanced Techniques to Support Geographical Knowledge Discovery from Big Geospatial Data

    Get PDF
    abstract: Big data that contain geo-referenced attributes have significantly reformed the way that I process and analyze geospatial data. Compared with the expected benefits received in the data-rich environment, more data have not always contributed to more accurate analysis. “Big but valueless” has becoming a critical concern to the community of GIScience and data-driven geography. As a highly-utilized function of GeoAI technique, deep learning models designed for processing geospatial data integrate powerful computing hardware and deep neural networks into various dimensions of geography to effectively discover the representation of data. However, limitations of these deep learning models have also been reported when People may have to spend much time on preparing training data for implementing a deep learning model. The objective of this dissertation research is to promote state-of-the-art deep learning models in discovering the representation, value and hidden knowledge of GIS and remote sensing data, through three research approaches. The first methodological framework aims to unify varied shadow into limited number of patterns, with the convolutional neural network (CNNs)-powered shape classification, multifarious shadow shapes with a limited number of representative shadow patterns for efficient shadow-based building height estimation. The second research focus integrates semantic analysis into a framework of various state-of-the-art CNNs to support human-level understanding of map content. The final research approach of this dissertation focuses on normalizing geospatial domain knowledge to promote the transferability of a CNN’s model to land-use/land-cover classification. This research reports a method designed to discover detailed land-use/land-cover types that might be challenging for a state-of-the-art CNN’s model that previously performed well on land-cover classification only.Dissertation/ThesisDoctoral Dissertation Geography 201

    An ontology-based framework for the automated analysis and interpretation of comic books' images

    Get PDF
    International audienceSince the beginning of the twenty-first century, the cultural industry has been through a massive and historical mutation induced by the rise of digital technologies. The comic books industry keeps looking for the right solution and has not yet produced anything as convincing as the music or movie have. A lot of energy has been spent to transfer printed material to digital supports so far. The specificities of those supports are not always exploited at the best of their capabilities, while they could potentially be used to create new reading conventions. In spite of the needs induced by the large amount of data created since the beginning of the comics history, content indexing has been left behind. It is indeed quite a challenge to index such a composition of textual and visual information. While a growing number of researchers are working on comic books' image analysis from a low-level point of view, only a few are tackling the issue of representing the content at a high semantic level. We propose in this article a framework to handle the content of a comic book, to support the automatic extraction of its visual components and to formalize the semantic of the domain's codes. We tested our framework over two applications: 1) the unsupervised content discovery of comic books' images, 2) its capabilities to handle complex layouts and to produce a respectful browsing experience to the digital comics reader

    Implementing non-photorealistic rendreing enhancements with real-time performance

    Get PDF
    We describe quality and performance enhancements, which work in real-time, to all well-known Non-photorealistic (NPR) rendering styles for use in an interactive context. These include Comic rendering, Sketch rendering, Hatching and Painterly rendering, but we also attempt and justify a widening of the established definition of what is considered NPR. In the individual Chapters, we identify typical stylistic elements of the different NPR styles. We list problems that need to be solved in order to implement the various renderers. Standard solutions available in the literature are introduced and in all cases extended and optimised. In particular, we extend the lighting model of the comic renderer to include a specular component and introduce multiple inter-related but independent geometric approximations which greatly improve rendering performance. We implement two completely different solutions to random perturbation sketching, solve temporal coherence issues for coal sketching and find an unexpected use for 3D textures to implement hatch-shading. Textured brushes of painterly rendering are extended by properties such as stroke-direction and texture, motion, paint capacity, opacity and emission, making them more flexible and versatile. Brushes are also provided with a minimal amount of intelligence, so that they can help in maximising screen coverage of brushes. We furthermore devise a completely new NPR style, which we call super-realistic and show how sample images can be tweened in real-time to produce an image-based six degree-of-freedom renderer performing at roughly 450 frames per second. Performance values for our other renderers all lie between 10 and over 400 frames per second on homePC hardware, justifying our real-time claim. A large number of sample screen-shots, illustrations and animations demonstrate the visual fidelity of our rendered images. In essence, we successfully achieve our attempted goals of increasing the creative, expressive and communicative potential of individual NPR styles, increasing performance of most of them, adding original and interesting visual qualities, and exploring new techniques or existing ones in novel ways.KMBT_363Adobe Acrobat 9.54 Paper Capture Plug-i

    Artistic Content Representation and Modelling based on Visual Style Features

    Get PDF
    This thesis aims to understand visual style in the context of computer science, using traditionally intangible artistic properties to enhance existing content manipulation algorithms and develop new content creation methods. The developed algorithms can be used to apply extracted properties to other drawings automatically; transfer a selected style; categorise images based upon perceived style; build 3D models using style features from concept artwork; and other style-based actions that change our perception of an object without changing our ability to recognise it. The research in this thesis aims to provide the style manipulation abilities that are missing from modern digital art creation pipelines
    corecore