98,285 research outputs found

    Construction and evaluation of classifiers for forensic document analysis

    Full text link
    In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Role Of Local Authorities In Health Issues: A Policy Document Analysis

    Get PDF
    Prior to the passing of the Health and Social Care Act 2012 the Communities and Local Government (CLG) Select Committee conducted an investigation into the proposed changes to the Public Health System in England. The Committee considered 40 written submissions and heard oral evidence from 26 expert witnesses. Their report, which included complete transcripts of both oral and written submissions, provided a rich and informed data on which to base an analysis of the proposed new public health system. This report analyses the main themes that emerged from the evidence submissions and forms part of our preliminary work for PRUComm’s PHOENIX project examining the development of the new public health system

    Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

    Full text link
    This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analysis, and confirms that this representation strategy is superior to a variety of popular hand-crafted alternatives. Experiments also show that (i) features extracted from CNNs are robust to compression, (ii) CNNs trained on non-document images transfer well to document analysis tasks, and (iii) enforcing region-specific feature-learning is unnecessary given sufficient training data. This work also makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories, useful for training new CNNs for document analysis

    Learning about Qualitative Document Analysis

    Get PDF
    This paper outlines and reflects on the process of undertaking a Qualitative Document Analysis (QDA) on policy and ‘practice’ documents in the rural water sector. This paper is relevant to organisations or researchers interested in research or evaluation methodologies that can provide a systematic analysis of policies and also serve as an engagement tool. The QDA was undertaken as part of the Triple-S (Sustainable Services at Scale) initiative, for which the Impact and Learning Team (ILT) at IDS serves as an External Learning Facilitator. The strengths and weaknesses of the methodology are discussed here. Overall, the team found that the QDA exercise provided useful information about trends and gaps in the rural water sector, helped to refine the Triple-S engagement strategy, and served as a useful platform for engagement with partner organisations

    ODA-based modeling for document analysis

    Get PDF
    This article proposes the document model of a hybrid knowledge-based document analysis system for business letters. The model combines requirements of object-oriented representation of both, documents as well as knowledge necessary for analysis tasks, and is based on the ODA platform. Model-driven document analysis increases the flexibility of a system because several analysis specialists can be used in co-operation to assist each other and to improve the results of analysis. The inherent modularity of the system allows for a reuse of knowledge sources and integral constituents of the architecture in other document classes such as forms or cheques

    Knowledge-Based Approach to Document Analysis

    Get PDF
    The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology

    Role of verbs in document analysis

    Get PDF
    We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than nouns. Two algorithms are presented and evaluated, one of which is shown to accurately discriminate documents by type and semantic properties, i.e. the event profile. The initial method, using WordNet (Miller et al. 1990), produced multiple cross-classification of articles, primarily due to the bushy nature of the verb tree coupled with the sense disambiguation problem. Our second approach using English Verb Classes and Alternations (EVCA) Levin (1993) showed that monosemous categorization of the frequent verbs in WSJ made it possible to usefully discriminate documents. For example, our results show that articles in which communication verbs predominate tend to be opinion pieces, whereas articles with a high percentage of agreement verbs tend to be about mergers or legal cases. An evaluation is performed on the results using Kendall's Ï„. We present convincing evidence for using verb semantic classes as a discriminant in document classification
    • …
    corecore