4 research outputs found

    Optical Font Recognition in Smartphone-Captured Images, and its Applicability for ID Forgery Detection

    Full text link
    In this paper, we consider the problem of detecting counterfeit identity documents in images captured with smartphones. As the number of documents contain special fonts, we study the applicability of convolutional neural networks (CNNs) for detection of the conformance of the fonts used with the ones, corresponding to the government standards. Here, we use multi-task learning to differentiate samples by both fonts and characters and compare the resulting classifier with its analogue trained for binary font classification. We train neural networks for authenticity estimation of the fonts used in machine-readable zones and ID numbers of the Russian national passport and test them on samples of individual characters acquired from 3238 images of the Russian national passport. Our results show that the usage of multi-task learning increases sensitivity and specificity of the classifier. Moreover, the resulting CNNs demonstrate high generalization ability as they correctly classify fonts which were not present in the training set. We conclude that the proposed method is sufficient for authentication of the fonts and can be used as a part of the forgery detection system for images acquired with a smartphone camera

    OCR Graph Features for Manipulation Detection in Documents

    Full text link
    Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task

    Method for Effective PDF Files Manipulation Detection

    Get PDF
    Käesoleva magistritöö eesmärgiks on lihtsustada PDF failides tehtud muudatuste tuvastamise protsessi kasutades faili lähtekoodi enne, kui liigutakse edasi teiste meetodite juurde nagu näiteks pilditöötlus. Lähtekoodi analüüs on mõeldud esimeseks sammuks, mis võimaldab säästa palju uurijate aega ning pakkuda rohkem tõestusmaterjali muudatuste tegemise kohta asitõendiks oleva digitaalse faili kohta. Magistritöö tulemusel valmib põhjalik ja efektiivne metoodika PDF failide terviklikkuse uurimiseks ja analüüsimiseks. Püstitatud eesmärgi saavutamiseks õpitakse kõigepealt tundma PDF faili ehitust mõistmaks faili struktuuri ja komponente. Seejärel tehakse ridamisi muudatusi faili lähtekoodis, mis võimaldab süveneda faili varjatud külgedesse ja leida haavatavaid kohti ning millest saadav informatsioon on abiks metoodika aluste paika panemisel. Failide enamlevinud muutmise tüüpide uurimisel saadakse kogum andmeid, millede suhtes hakatakse võrdlema uurimise all olevaid faile ning seeläbi testitakse faili tõepärasust. Lisaks otsitakse vabavaralisi tarkvarasid, millega antud ülesannet lahendada. Töö lõpetatakse kontrollkatsetega, sealhulgas hinnatakse saadud tulemusi ja märgitakse ära tuleviku tegevussuunad antud valdkonnas.The aim of this thesis is to ease the process of detecting manipulations in PDF files by addressing its source code, before having to use other methods such as image processing or text-line examination. It is intended to be a previous step to tackle, which can save a lot of time to examiners and provide them with more proof of manipulations regarding digital file evidence. The result is the construction of a solid and effective method for PDF file investigation and analysis to determine its integrity. To achieve this goal, a study of PDF file anatomy will be conducted firstly, in order to become familiar with the structure and composition of this file format. Afterwards, a series of manipulations performed directly against the file source code will deepen in its secrets and vulnerabilities, and will therefore help in setting the foundations for the method. Finally, a study on the most common types of file manipulations will lead to a set of layouts to which compare the files under investigation and thus, test its veracity, complemented with a quest for specialised open source tools to accomplish this task; a set of validation experiments will complete the work, evaluating the obtained results and stating future lines of work in this field
    corecore