9,135 research outputs found
Recognizing Bengali Word Images - A Zero-Shot Learning Perspective
Zero-Shot Learning(ZSL) techniques could classify a completely unseen class, which it has never seen before during training. Thus, making it more apt for any real-life classification problem, where it is not possible to train a system with annotated data for all possible class types. This work investigates recognition of word images written in Bengali Script in a ZSL framework. The proposed approach performs Zero-Shot word recognition by coupling deep learned features procured from various CNN architectures along with 13 basic shapes/stroke primitives commonly observed in Bengali script characters. As per the notion of ZSL framework those 13 basic shapes are termed as “Signature/Semantic Attributes”. The obtained results are promising while evaluation was carried out in a Five-Fold cross-validation setup dealing with samples from 250 word classes
Writer adaptation for offline text recognition: An exploration of neural network-based methods
Handwriting recognition has seen significant success with the use of deep
learning. However, a persistent shortcoming of neural networks is that they are
not well-equipped to deal with shifting data distributions. In the field of
handwritten text recognition (HTR), this shows itself in poor recognition
accuracy for writers that are not similar to those seen during training. An
ideal HTR model should be adaptive to new writing styles in order to handle the
vast amount of possible writing styles. In this paper, we explore how HTR
models can be made writer adaptive by using only a handful of examples from a
new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used
as base models, using a ResNet backbone along with either an LSTM or
Transformer sequence decoder. Using these base models, two methods are
considered to make them writer adaptive: 1) model-agnostic meta-learning
(MAML), an algorithm commonly used for tasks such as few-shot classification,
and 2) writer codes, an idea originating from automatic speech recognition.
Results show that an HTR-specific version of MAML known as MetaHTR improves
performance compared to the baseline with a 1.4 to 2.0 improvement in word
error rate (WER). The improvement due to writer adaptation is between 0.2 and
0.7 WER, where a deeper model seems to lend itself better to adaptation using
MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models
or sentence-level HTR may become prohibitive due to its high computational and
memory requirements. Lastly, writer codes based on learned features or Hinge
statistical features did not lead to improved recognition performance.Comment: 21 pages including appendices, 6 figures, 10 table
DeViL: Decoding Vision features into Language
Post-hoc explanation methods have often been criticised for abstracting away
the decision-making process of deep neural networks. In this work, we would
like to provide natural language descriptions for what different layers of a
vision backbone have learned. Our DeViL method decodes vision features into
language, not only highlighting the attribution locations but also generating
textual descriptions of visual features at different layers of the network. We
train a transformer network to translate individual image features of any
vision layer into a prompt that a separate off-the-shelf language model decodes
into natural language. By employing dropout both per-layer and
per-spatial-location, our model can generalize training on image-text pairs to
generate localized explanations. As it uses a pre-trained language model, our
approach is fast to train, can be applied to any vision backbone, and produces
textual descriptions at different layers of the vision network. Moreover, DeViL
can create open-vocabulary attribution maps corresponding to words or phrases
even outside the training scope of the vision model. We demonstrate that DeViL
generates textual descriptions relevant to the image content on CC3M surpassing
previous lightweight captioning models and attribution maps uncovering the
learned concepts of the vision backbone. Finally, we show DeViL also
outperforms the current state-of-the-art on the neuron-wise descriptions of the
MILANNOTATIONS dataset. Code available at
https://github.com/ExplainableML/DeViLComment: Accepted at GCPR 2023 (Oral
One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
When learning a new skill, you take advantage of your preexisting skills and
knowledge. For instance, if you are a skilled violinist, you will likely have
an easier time learning to play cello. Similarly, when learning a new language
you take advantage of the languages you already speak. For instance, if your
native language is Norwegian and you decide to learn Dutch, the lexical overlap
between these two languages will likely benefit your rate of language
acquisition. This thesis deals with the intersection of learning multiple tasks
and learning multiple languages in the context of Natural Language Processing
(NLP), which can be defined as the study of computational processing of human
language. Although these two types of learning may seem different on the
surface, we will see that they share many similarities.
The traditional approach in NLP is to consider a single task for a single
language at a time. However, recent advances allow for broadening this
approach, by considering data for multiple tasks and languages simultaneously.
This is an important approach to explore further as the key to improving the
reliability of NLP, especially for low-resource languages, is to take advantage
of all relevant data whenever possible. In doing so, the hope is that in the
long term, low-resource languages can benefit from the advances made in NLP
which are currently to a large extent reserved for high-resource languages.
This, in turn, may then have positive consequences for, e.g., language
preservation, as speakers of minority languages will have a lower degree of
pressure to using high-resource languages. In the short term, answering the
specific research questions posed should be of use to NLP researchers working
towards the same goal.Comment: PhD thesis, University of Groninge
Detecting graves in GPR data: assessing the viability of machine learning for the interpretation of graves in B-scan data using medieval Irish case studies.
As commercial archaeogeophysical survey progressively shifts towards large landscape-scale surveys, small features like graves become more difficult to identify and interpret. In order to increase the rate and confidence of grave identification before excavation using geophysical methods, the accuracy and speed of survey outputs and reporting must be improved. The approach taken in this research was first to consider the survey parameters that govern the effectiveness of the four conventional techniques used in commercial archaeogeophysical evaluations (magnetometry, earth resistance, electromagnetic induction and ground-penetrating radar). Subsequently, in respect of ground-penetrating radar (GPR), this research developed machine learning applications to improve the speed and confidence of detecting inhumation graves. The survey parameters research combined established survey guidelines for the UK, Ireland, and Europe to account for local geology, soils and land cover to provide survey guidance for individual sites via a decision-based application linked to GIS database. To develop two machine learning tools for localising and probability scoring grave-like responses in GPR data, convolutional neural networks and transfer learning were used to analyse radargrams of medieval graves and timeslices of modern proxy clandestine graves. Models were c. 93% accurate at labelling images as containing a grave or no grave and c. 96% accurate in labelling and locating potential graves in radargram images. For timeslices, machine learning models achieved 94% classification accuracy. The >90% accuracy of the machine learning models demonstrates the viability of machine-assisted detection of inhumation graves within GPR data. While the expansion of the training dataset would further improve the accuracy of the proposed methods, the current machine-led interpretation methods provide valuable assistance for human-led interpretation until more data becomes available. The survey guidance tool and the two machine learning applications have been packaged into the Reilig web application toolset, which is freely available
- …