9,135 research outputs found

    Recognizing Bengali Word Images - A Zero-Shot Learning Perspective

    Get PDF
    Zero-Shot Learning(ZSL) techniques could classify a completely unseen class, which it has never seen before during training. Thus, making it more apt for any real-life classification problem, where it is not possible to train a system with annotated data for all possible class types. This work investigates recognition of word images written in Bengali Script in a ZSL framework. The proposed approach performs Zero-Shot word recognition by coupling deep learned features procured from various CNN architectures along with 13 basic shapes/stroke primitives commonly observed in Bengali script characters. As per the notion of ZSL framework those 13 basic shapes are termed as “Signature/Semantic Attributes”. The obtained results are promising while evaluation was carried out in a Five-Fold cross-validation setup dealing with samples from 250 word classes

    Writer adaptation for offline text recognition: An exploration of neural network-based methods

    Full text link
    Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.Comment: 21 pages including appendices, 6 figures, 10 table

    DeViL: Decoding Vision features into Language

    Full text link
    Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method decodes vision features into language, not only highlighting the attribution locations but also generating textual descriptions of visual features at different layers of the network. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language. By employing dropout both per-layer and per-spatial-location, our model can generalize training on image-text pairs to generate localized explanations. As it uses a pre-trained language model, our approach is fast to train, can be applied to any vision backbone, and produces textual descriptions at different layers of the vision network. Moreover, DeViL can create open-vocabulary attribution maps corresponding to words or phrases even outside the training scope of the vision model. We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M surpassing previous lightweight captioning models and attribution maps uncovering the learned concepts of the vision backbone. Finally, we show DeViL also outperforms the current state-of-the-art on the neuron-wise descriptions of the MILANNOTATIONS dataset. Code available at https://github.com/ExplainableML/DeViLComment: Accepted at GCPR 2023 (Oral

    One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

    Get PDF
    When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge

    Advancing natural language processing in political science

    Get PDF

    Detecting graves in GPR data: assessing the viability of machine learning for the interpretation of graves in B-scan data using medieval Irish case studies.

    Get PDF
    As commercial archaeogeophysical survey progressively shifts towards large landscape-scale surveys, small features like graves become more difficult to identify and interpret. In order to increase the rate and confidence of grave identification before excavation using geophysical methods, the accuracy and speed of survey outputs and reporting must be improved. The approach taken in this research was first to consider the survey parameters that govern the effectiveness of the four conventional techniques used in commercial archaeogeophysical evaluations (magnetometry, earth resistance, electromagnetic induction and ground-penetrating radar). Subsequently, in respect of ground-penetrating radar (GPR), this research developed machine learning applications to improve the speed and confidence of detecting inhumation graves. The survey parameters research combined established survey guidelines for the UK, Ireland, and Europe to account for local geology, soils and land cover to provide survey guidance for individual sites via a decision-based application linked to GIS database. To develop two machine learning tools for localising and probability scoring grave-like responses in GPR data, convolutional neural networks and transfer learning were used to analyse radargrams of medieval graves and timeslices of modern proxy clandestine graves. Models were c. 93% accurate at labelling images as containing a grave or no grave and c. 96% accurate in labelling and locating potential graves in radargram images. For timeslices, machine learning models achieved 94% classification accuracy. The >90% accuracy of the machine learning models demonstrates the viability of machine-assisted detection of inhumation graves within GPR data. While the expansion of the training dataset would further improve the accuracy of the proposed methods, the current machine-led interpretation methods provide valuable assistance for human-led interpretation until more data becomes available. The survey guidance tool and the two machine learning applications have been packaged into the Reilig web application toolset, which is freely available
    corecore