Search CORE

24,364 research outputs found

Construction and evaluation of classifiers for forensic document analysis

Author: Davis Linda J.
Gantz Donald T.
Lamas Andrea C.
Miller John J.
Saunders Christopher P.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/06/2011
Field of study

In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Query by String word spotting based on character bi-gram indexing

Author: Ghosh Suman K.
Valveny Ernest
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2015
Field of study

In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

arXiv.org e-Print Archive

Crossref

A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

Author: Alberti Michele
Fischer Andreas
Goktepe Pinar
Ingold Rolf
Kolonko Thomas
Liwicki Marcus
Pondenkandath Vinaychandran
Studer Linda
Publication venue
Publication date: 22/05/2019
Field of study

Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

arXiv.org e-Print Archive

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

For Geometric Inference from Images, What Kind of Statistical Model Is Necessary?

Author: Kanatani Kenichi
Publication venue: Faculty of Engineering, Okayama University
Publication date: 01/11/2002
Field of study

In order to facilitate smooth communications with researchers in other fields including statistics, this paper investigates the meaning of "statistical methods" for geometric inference based on image feature points, We point out that statistical analysis does not make sense unless the underlying "statistical ensemble" is clearly defined. We trace back the origin of feature uncertainty to image processing operations for computer vision in general and discuss the implications of asymptotic analysis for performance evaluation in reference to "geometric fitting", "geometric model selection", the "geometric AIC", and the "geometric MDL". Referring to such statistical concepts as "nuisance parameters", the "Neyman-Scott problem", and "semiparametric models", we point out that simulation experiments for performance evaluation will lose meaning without carefully considering the assumptions involved and intended applications

Okayama University Scientific Achievement Repository

Associative and repetition priming with the repeated masked prime technique: No priming found

Author: Avons SE
Cameron Marie
Cinel Caterina
Glynn Kevin
McDonald Rebecca
Russo Riccardo
Verolini Veronica
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2008
Field of study

Wentura and Frings (2005) reported evidence of subliminal categorical priming on a lexical decision task, using a new method of visual masking in which the prime string consisted of the prime word flanked by random consonants and random letter masks alternated with the prime string on successive refresh cycles. We investigated associative and repetition priming on lexical decision, using the same method of visual masking. Three experiments failed to show any evidence of associative priming, (1) when the prime string was fixed at 10 characters (three to six flanking letters) and (2) when the number of flanking letters were reduced or absent. In all cases, prime detection was at chance level. Strong associative priming was observed with visible unmasked primes, but the addition of flanking letters restricted priming even though prime detection was still high. With repetition priming, no priming effects were found with the repeated masked technique, and prime detection was poor but just above chance levels. We conclude that with repeated masked primes, there is effective visual masking but that associative priming and repetition priming do not occur with experiment-unique prime-target pairs. Explanations for this apparent discrepancy across priming paradigms are discussed. The priming stimuli and prime-target pairs used in this study may be downloaded as supplemental materials from mc.psychonomic-journals.org/content/supplemental. © 2009 The Psychonomic Society, Inc

University of Essex Research Repository

On the Feasibility of Malware Authorship Attribution

Author: A Rahimian
C Kruegel
DE Knuth
DI Holmes
EH Spafford
F Can
G Frantzeskou
I Krsul
J Ferrante
M Fowler
N Pržulj
N Rosenblum
S Alrabaee
S Alrabaee
S Alrabaee
S Burrows
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/01/2017
Field of study

There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.Comment: FPS 201

arXiv.org e-Print Archive

Crossref