98,285 research outputs found
Construction and evaluation of classifiers for forensic document analysis
In this study we illustrate a statistical approach to questioned document
examination. Specifically, we consider the construction of three classifiers
that predict the writer of a sample document based on categorical data. To
evaluate these classifiers, we use a data set with a large number of writers
and a small number of writing samples per writer. Since the resulting
classifiers were found to have near perfect accuracy using leave-one-out
cross-validation, we propose a novel Bayesian-based cross-validation method for
evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The Role Of Local Authorities In Health Issues: A Policy Document Analysis
Prior to the passing of the Health and Social Care Act 2012 the Communities and Local Government (CLG) Select Committee conducted an investigation into the proposed changes to the Public Health System in England. The Committee considered 40 written submissions and heard oral evidence from 26 expert witnesses. Their report, which included complete transcripts of both oral and written submissions, provided a rich and informed data on which to base an analysis of the proposed new public health system. This report analyses the main themes that emerged from the evidence submissions and forms part of our preliminary work for PRUComm’s PHOENIX project examining the development of the new public health system
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
Learning about Qualitative Document Analysis
This paper outlines and reflects on the process of undertaking a Qualitative Document Analysis (QDA) on policy and ‘practice’ documents in the rural water sector. This paper is relevant to organisations or researchers interested in research or evaluation methodologies that can provide a systematic analysis of policies and also serve as an engagement tool.
The QDA was undertaken as part of the Triple-S (Sustainable Services at Scale) initiative, for which the Impact and Learning Team (ILT) at IDS serves as an External Learning Facilitator. The strengths and weaknesses of the methodology are discussed here. Overall, the team found that the QDA exercise provided useful information about trends and gaps in the rural water sector, helped to refine the Triple-S engagement strategy, and served as a useful platform for engagement with partner organisations
ODA-based modeling for document analysis
This article proposes the document model of a hybrid knowledge-based document analysis system for business letters. The model combines requirements of object-oriented representation of both, documents as well as knowledge necessary for analysis tasks, and is based on the ODA platform. Model-driven document analysis increases the flexibility of a system because several analysis specialists can be used in co-operation to assist each other and to improve the results of analysis. The inherent modularity of the system allows for a reuse of knowledge sources and integral constituents of the architecture in other document classes such as forms or cheques
Knowledge-Based Approach to Document Analysis
The paper presents an approach to extraction of facts from texts of documents. This approach is based
on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact
structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually
extracted facts combine into one structure the dictionary lexical objects found in the text and match them against
concepts of subject domain ontology
Role of verbs in document analysis
We present results of two methods for assessing the event profile of news articles as a function of verb type. The unique contribution of this research is the focus on the role of verbs, rather than nouns. Two algorithms are presented and evaluated, one of which is shown to accurately discriminate documents by type and semantic properties, i.e. the event profile. The initial method, using WordNet (Miller et al. 1990), produced multiple cross-classification of articles, primarily due to the bushy nature of the verb tree coupled with the sense disambiguation problem. Our second approach using English Verb Classes and Alternations (EVCA) Levin (1993) showed that monosemous categorization of the frequent verbs in WSJ made it possible to usefully discriminate documents. For example, our results show that articles in which communication verbs predominate tend to be opinion pieces, whereas articles with a high percentage of agreement verbs tend to be about mergers or legal cases. An evaluation is performed on the results using Kendall's Ï„. We present convincing evidence for using verb semantic classes as a discriminant in document classification
- …