Search CORE

6 research outputs found

Implicit Language Model in LSTM for OCR

Author: Natarajan Prem
Rawls Stephen
Sabir Ekraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/05/2018
Field of study

Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4\% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our configuration). We believe that this is the first ever attempt at characterizing the strength of the implicit LM in LSTM based OCR systems

arXiv.org e-Print Archive

Crossref

Deep Multimodal Image-Repurposing Detection

Author: AbdAlmageed Wael
Natarajan Prem
Sabir Ekraam
Wu Yue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2018
Field of study

Nefarious actors on social media and other platforms often spread rumors and falsehoods through images whose metadata (e.g., captions) have been modified to provide visual substantiation of the rumor/falsehood. This type of modification is referred to as image repurposing, in which often an unmanipulated image is published along with incorrect or manipulated metadata to serve the actor's ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR) dataset, a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr. We also present a novel, end-to-end, deep multimodal learning model for assessing the integrity of an image by combining information extracted from the image with related information from a knowledge base. The proposed method is compared against state-of-the-art techniques on existing datasets as well as MEIR, where it outperforms existing methods across the board, with AUC improvement up to 0.23.Comment: To be published at ACM Multimeda 2018 (orals

arXiv.org e-Print Archive

Crossref

MONet: Multi-scale Overlap Network for Duplication Detection in Biomedical Images

Author: AbdAlmageed Wael
Nandi Soumyaroop
Natarajan Prem
Sabir Ekraam
Publication venue
Publication date: 19/07/2022
Field of study

Manipulation of biomedical images to misrepresent experimental results has plagued the biomedical community for a while. Recent interest in the problem led to the curation of a dataset and associated tasks to promote the development of biomedical forensic methods. Of these, the largest manipulation detection task focuses on the detection of duplicated regions between images. Traditional computer-vision based forensic models trained on natural images are not designed to overcome the challenges presented by biomedical images. We propose a multi-scale overlap detection model to detect duplicated image regions. Our model is structured to find duplication hierarchically, so as to reduce the number of patch operations. It achieves state-of-the-art performance overall and on multiple biomedical image categories.Comment: To appear at ICIP 202

arXiv.org e-Print Archive