48,434 research outputs found
Enhanced Integrated Scoring for Cleaning Dirty Texts
An increasing number of approaches for ontology engineering from text are
gearing towards the use of online sources such as company intranet and the
World Wide Web. Despite such rise, not much work can be found in aspects of
preprocessing and cleaning dirty texts from online sources. This paper presents
an enhancement of an Integrated Scoring for Spelling error correction,
Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as
part of a text preprocessing phase in an ontology engineering system. New
evaluations performed on the enhanced ISSAC using 700 chat records reveal an
improved accuracy of 98% as compared to 96.5% and 71% based on the use of only
basic ISSAC and of Aspell, respectively.Comment: More information is available at
http://explorer.csse.uwa.edu.au/reference
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
We present CloudScan; an invoice analysis system that requires zero
configuration or upfront annotation. In contrast to previous work, CloudScan
does not rely on templates of invoice layout, instead it learns a single global
model of invoices that naturally generalizes to unseen invoice layouts. The
model is trained using data automatically extracted from end-user provided
feedback. This automatic training data extraction removes the requirement for
users to annotate the data precisely. We describe a recurrent neural network
model that can capture long range context and compare it to a baseline logistic
regression model corresponding to the current CloudScan production system. We
train and evaluate the system on 8 important fields using a dataset of 326,471
invoices. The recurrent neural network and baseline model achieve 0.891 and
0.887 average F1 scores respectively on seen invoice layouts. For the harder
task of unseen invoice layouts, the recurrent neural network model outperforms
the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201
Towards Automatic Extraction of Social Networks of Organizations in PubMed Abstracts
Social Network Analysis (SNA) of organizations can attract great interest
from government agencies and scientists for its ability to boost translational
research and accelerate the process of converting research to care. For SNA of
a particular disease area, we need to identify the key research groups in that
area by mining the affiliation information from PubMed. This not only involves
recognizing the organization names in the affiliation string, but also
resolving ambiguities to identify the article with a unique organization. We
present here a process of normalization that involves clustering based on local
sequence alignment metrics and local learning based on finding connected
components. We demonstrate the application of the method by analyzing
organizations involved in angiogenensis treatment, and demonstrating the
utility of the results for researchers in the pharmaceutical and biotechnology
industries or national funding agencies.Comment: This paper has been withdrawn; First International Workshop on Graph
Techniques for Biomedical Networks in Conjunction with IEEE International
Conference on Bioinformatics and Biomedicine, Washington D.C., USA, Nov. 1-4,
2009; http://www.public.asu.edu/~sjonnal3/home/papers/IEEE%20BIBM%202009.pd
- …