7 research outputs found

    Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

    Full text link
    Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.Comment: Oral presentation at International Conference on Document Analysis and Recognition (ICDAR) - 2019. For dataset, pre-trained networks and additional details, visit project page at http://ihdia.iiit.ac.in

    Deep Learning for Scene Text Detection, Recognition, and Understanding

    Get PDF
    Detecting and recognizing texts in images is a long-standing task in computer vision. The goal of this task is to extract textual information from images and videos, such as recognizing license plates. Despite that the great progresses have been made in recent years, it still remains challenging due to the wide range of variations in text appearance. In this thesis, we aim to review the existing issues that hinder current Optical Character Recognition (OCR) development and explore potential solutions. Specifically, we first investigate the phenomenon of unfair comparisons between different OCR algorithms caused due to the lack of a consistent evaluation framework. Such an absence of a unified evaluation protocol leads to inconsistent and unreliable results, making it difficult to compare and improve upon existing methods. To tackle this issue, we design a new evaluation framework from the aspect of datasets, metrics, and models, enabling consistent and fair comparisons between OCR systems. Another issue existing in the field is the imbalanced distribution of training samples. In particular, the sample distribution largely depended on where and how the data was collected, and the resulting data bias may lead to poor performance and low generalizability on under-represented classes. To address this problem, we took the driving license plate recognition task as an example and proposed a text-to-image model that is able to synthesize photo-realistic text samples. By using this model, we synthesized more than one million samples to augment the training dataset, significantly improving the generalization capability of OCR models. Additionally, this thesis also explores the application of text vision question answering, which is a new and emerging research topic among the OCR community. This task challenges the OCR models to understand the relationships between the text and backgrounds and to answer the given questions. In this thesis, we propose to investigate evidence-based text VQA, which involves designing models that can provide reasonable evidence for their predictions, thus improving the generalization ability.Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 202

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    The expectations, outcomes and perceived benefits of postgraduate business programmes for Indian nationals

    Get PDF
    The literature suggests that there are several stakeholders who influence postgraduate study for social and personal benefits. These stakeholders comprise governments, employers, HEIs, families and individuals. The findings from this research study suggest that fathers from Indian middle class families play a significant motivational role in the lives of their offspring by encouraging them to develop their cultural capital through postgraduate business programmes. Parents expect that higher level academic study will improve their offspring’s symbolic and social capitals and result in good social and career outcomes (Bourdieu, 1986). Many of these Indian middle class parents who have access to and are willing to use their economic capital, also influence and support their children to gain ‘exposure’ in new environments for example in the USA and the UK . The aim of this parental competitive strategy is to ensure ‘positional advantage’ (Brown, 2003, p3) in the labour market and in some cases to gain experience foreign study that they did not have the opportunity to do. As more Indian nationals undertake postgraduate business programmes there is evidence to suggest that credentialism is resulting in what Brown, Lauder & Ashton (2011) argue is a ‘global auction’, bringing more rewards only for the very best or the educated elite. This perception was found from the respondents in this study. Brown, Lauder & Ashton (2011) further argue that this is perpetuating social divisions in different societies as the labour market becomes more competitive due to economic trends and corporate restructuring. The findings from this study suggest that most Indian respondents who have postgraduate business qualifications achieve some of their expectations, but not at the management level, nor in other areas that they had expected e.g. they achieve a lower than expected salary. To ensure graduates career expectations are realistic, the findings suggest that UK and Indian higher education institutions, should report in an ethical and honest way, the destinations and career outcomes of all their Indian business postgraduates. The findings also suggest that UK and Indian institutions should improve their alumni services and forge closer links with Indian employers to support graduates’ career opportunities. Evidence was also found which suggests that there is a perception of greater symbolic capital from UK credentials, which may add value to an individual’s employment opportunities and to their marriage capital and where it occurs, their dowry/gift capital
    corecore