Search CORE

4 research outputs found

Optical Font Recognition in Smartphone-Captured Images, and its Applicability for ID Forgery Detection

Author: Berenguel
Bertrand
Bertrand
bin Kwon
Wang
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 18/10/2018
Field of study

In this paper, we consider the problem of detecting counterfeit identity documents in images captured with smartphones. As the number of documents contain special fonts, we study the applicability of convolutional neural networks (CNNs) for detection of the conformance of the fonts used with the ones, corresponding to the government standards. Here, we use multi-task learning to differentiate samples by both fonts and characters and compare the resulting classifier with its analogue trained for binary font classification. We train neural networks for authenticity estimation of the fonts used in machine-readable zones and ID numbers of the Russian national passport and test them on samples of individual characters acquired from 3238 images of the Russian national passport. Our results show that the usage of multi-task learning increases sensitivity and specificity of the classifier. Moreover, the resulting CNNs demonstrate high generalization ability as they correctly classify fonts which were not present in the training set. We conclude that the proposed method is sufficient for authentication of the fonts and can be used as a part of the forgery detection system for images acquired with a smartphone camera

arXiv.org e-Print Archive

Crossref

OCR Graph Features for Manipulation Detection in Documents

Author: Gupta Otkrist
James Hailey
Raviv Dan
Publication venue
Publication date: 14/09/2020
Field of study

Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task

arXiv.org e-Print Archive

Method for Effective PDF Files Manipulation Detection

Author: Fernández Bascuñana Gema
Publication venue
Publication date: 01/01/2017
Field of study

Käesoleva magistritöö eesmärgiks on lihtsustada PDF failides tehtud muudatuste tuvastamise protsessi kasutades faili lähtekoodi enne, kui liigutakse edasi teiste meetodite juurde nagu näiteks pilditöötlus. Lähtekoodi analüüs on mõeldud esimeseks sammuks, mis võimaldab säästa palju uurijate aega ning pakkuda rohkem tõestusmaterjali muudatuste tegemise kohta asitõendiks oleva digitaalse faili kohta. Magistritöö tulemusel valmib põhjalik ja efektiivne metoodika PDF failide terviklikkuse uurimiseks ja analüüsimiseks. Püstitatud eesmärgi saavutamiseks õpitakse kõigepealt tundma PDF faili ehitust mõistmaks faili struktuuri ja komponente. Seejärel tehakse ridamisi muudatusi faili lähtekoodis, mis võimaldab süveneda faili varjatud külgedesse ja leida haavatavaid kohti ning millest saadav informatsioon on abiks metoodika aluste paika panemisel. Failide enamlevinud muutmise tüüpide uurimisel saadakse kogum andmeid, millede suhtes hakatakse võrdlema uurimise all olevaid faile ning seeläbi testitakse faili tõepärasust. Lisaks otsitakse vabavaralisi tarkvarasid, millega antud ülesannet lahendada. Töö lõpetatakse kontrollkatsetega, sealhulgas hinnatakse saadud tulemusi ja märgitakse ära tuleviku tegevussuunad antud valdkonnas.The aim of this thesis is to ease the process of detecting manipulations in PDF files by addressing its source code, before having to use other methods such as image processing or text-line examination. It is intended to be a previous step to tackle, which can save a lot of time to examiners and provide them with more proof of manipulations regarding digital file evidence. The result is the construction of a solid and effective method for PDF file investigation and analysis to determine its integrity. To achieve this goal, a study of PDF file anatomy will be conducted firstly, in order to become familiar with the structure and composition of this file format. Afterwards, a series of manipulations performed directly against the file source code will deepen in its secrets and vulnerabilities, and will therefore help in setting the foundations for the method. Finally, a study on the most common types of file manipulations will lead to a set of layouts to which compare the files under investigation and thus, test its veracity, complemented with a quest for specialised open source tools to accomplish this task; a set of validation experiments will complete the work, evaluating the obtained results and stating future lines of work in this field

DSpace at Tartu University Library