Search CORE

3,249,810 research outputs found

Visual analysis of document triage data

Author: Buchanan George R.
Laramee Robert S.
Loizides Fernando
Zhao Geng
Publication venue
Publication date: 01/01/2011
Field of study

As part of the information seeking process, a large amount of effort is invested in order to study and understand how information seekers search through documents such that they can assess their relevance. This search and assessment of document relevance, known as document triage, is an important information seeking process, but is not yet well understood. Human-computer interaction (HCI) and digital library scientists have undertaken a series of user studies involving information seeking, collected a large amount of data describing information seekers' behavior during document search. Next to this, we have witnessed a rapid increase in the number of off-the-shelf visualization tools which can benefit document triage study. Here we set out to utilize existing information visualization techniques and tools in order to gain a better understanding of the large amount of user-study data collected by HCI and digital library researchers. We describe the range of available tools and visualizations we use in order to increase our knowledge of document triage. Treemap, parallel coordinates, stack graph, matrix chart, as well as other visualization methods, prove to be insightful in exploring, analyzing and presenting user behavior during document triage. Our findings and visualizations are evaluated by HCI and digital library researchers studying this proble

Ktisis

Better Document-level Sentiment Analysis from RST Discourse Parsing

Author: Bhatia Parminder
Eisenstein Jacob
Ji Yangfeng
Publication venue
Publication date: 01/01/2015
Field of study

Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position in a dependency representation of the rhetorical structure can yield substantial improvements on lexicon-based sentiment analysis. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classification-based methods.Comment: Published at Empirical Methods in Natural Language Processing (EMNLP 2015

arXiv.org e-Print Archive

CiteSeerX

Crossref

Construction and evaluation of classifiers for forensic document analysis

Author: Davis Linda J.
Gantz Donald T.
Lamas Andrea C.
Miller John J.
Saunders Christopher P.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/06/2011
Field of study

In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Towards a document structure editor for software requirements analysis

Author: Kowalski Vincent J.
Lekkos Anthony A.
Publication venue
Publication date
Field of study

Of the six or seven phases of the software engineering life cycle, requirements analysis tends to be the least understood and the least formalized. Correspondingly, a scarcity of useful software tools exist which aid in the development of user and system requirements. It is proposed that requirements analysis should culminate in a set of documents similar to those that usually accompany a delivered Software product. The design of a software tool, the Document Structure Editor, which facilitates the development of such documentation

NASA Technical Reports Server

Document Style Recognition Using Shallow Statistical Analysis

Author: Braslavski P.
Браславский П. И.
Publication venue
Publication date: 01/01/2004
Field of study

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

Author: Derpanis Konstantinos G.
Harley Adam W.
Ufkes Alex
Publication venue
Publication date: 25/02/2015
Field of study

This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analysis, and confirms that this representation strategy is superior to a variety of popular hand-crafted alternatives. Experiments also show that (i) features extracted from CNNs are robust to compression, (ii) CNNs trained on non-document images transfer well to document analysis tasks, and (iii) enforcing region-specific feature-learning is unnecessary given sufficient training data. This work also makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories, useful for training new CNNs for document analysis

arXiv.org e-Print Archive

Crossref

Exploratory Analysis of Highly Heterogeneous Document Collections

Author: Blei D. M.
Bun K. K.
Maiya A. S.
Manning C. D.
Mihalcea R.
Pecina P.
Ranganathan S. R.
Wagstaff K.
Publication venue
Publication date: 01/01/2013
Field of study

We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Role Of Local Authorities In Health Issues: A Policy Document Analysis

Author: Coleman A.
Gadsby Erica W.
Peckham Stephen
Riches N.
Publication venue: PRUComm
Publication date: 01/03/2015
Field of study

Prior to the passing of the Health and Social Care Act 2012 the Communities and Local Government (CLG) Select Committee conducted an investigation into the proposed changes to the Public Health System in England. The Committee considered 40 written submissions and heard oral evidence from 26 expert witnesses. Their report, which included complete transcripts of both oral and written submissions, provided a rich and informed data on which to base an analysis of the proposed new public health system. This report analyses the main themes that emerged from the evidence submissions and forms part of our preliminary work for PRUComm’s PHOENIX project examining the development of the new public health system

Kent Academic Repository

The University of Manchester - Institutional Repository