67,598 research outputs found
Lifelong Learning CRF for Supervised Aspect Extraction
This paper makes a focused contribution to supervised aspect extraction. It
shows that if the system has performed aspect extraction from many past domains
and retained their results as knowledge, Conditional Random Fields (CRF) can
leverage this knowledge in a lifelong learning manner to extract in a new
domain markedly better than the traditional CRF without using this prior
knowledge. The key innovation is that even after CRF training, the model can
still improve its extraction with experiences in its applications.Comment: Accepted at ACL 2017. arXiv admin note: text overlap with
arXiv:1612.0794
Scatteract: Automated extraction of data from scatter plots
Charts are an excellent way to convey patterns and trends in data, but they
do not facilitate further modeling of the data or close inspection of
individual data points. We present a fully automated system for extracting the
numerical values of data points from images of scatter plots. We use deep
learning techniques to identify the key components of the chart, and optical
character recognition together with robust regression to map from pixels to the
coordinate system of the chart. We focus on scatter plots with linear scales,
which already have several interesting challenges. Previous work has done fully
automatic extraction for other types of charts, but to our knowledge this is
the first approach that is fully automatic for scatter plots. Our method
performs well, achieving successful data extraction on 89% of the plots in our
test set.Comment: Submitted to ECML PKDD 2017 proceedings, 16 page
Searching for Ground Truth: a stepping stone in automating genre classification
This paper examines genre classification of documents and
its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers, conducted to assist in genre characterisation and the prediction of obstacles which need to be overcome by an automated system, and to contribute to the process of creating a solid testbed corpus for extending automated genre classification and testing metadata extraction tools across genres. We also describe the performance of two classifiers based on image and stylistic modeling features in labelling the data resulting from the agreement of three human labellers across fifteen genre classes.
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be deļ¬ned by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The ļ¬ndings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
- ā¦