8,413 research outputs found
Scientific Information Extraction with Semi-supervised Neural Tagging
This paper addresses the problem of extracting keyphrases from scientific
articles and categorizing them as corresponding to a task, process, or
material. We cast the problem as sequence tagging and introduce semi-supervised
methods to a neural tagging model, which builds on recent advances in named
entity recognition. Since annotated training data is scarce in this domain, we
introduce a graph-based semi-supervised algorithm together with a data
selection scheme to leverage unannotated articles. Both inductive and
transductive semi-supervised learning strategies outperform state-of-the-art
information extraction performance on the 2017 SemEval Task 10 ScienceIE task.Comment: accepted by EMNLP 201
Gamma-based clustering via ordered means with application to gene-expression analysis
Discrete mixture models provide a well-known basis for effective clustering
algorithms, although technical challenges have limited their scope. In the
context of gene-expression data analysis, a model is presented that mixes over
a finite catalog of structures, each one representing equality and inequality
constraints among latent expected values. Computations depend on the
probability that independent gamma-distributed variables attain each of their
possible orderings. Each ordering event is equivalent to an event in
independent negative-binomial random variables, and this finding guides a
dynamic-programming calculation. The structuring of mixture-model components
according to constraints among latent means leads to strict concavity of the
mixture log likelihood. In addition to its beneficial numerical properties, the
clustering method shows promising results in an empirical study.Comment: Published in at http://dx.doi.org/10.1214/10-AOS805 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
In this paper, we propose a new approach to construct a system of
transformation rules for the Part-of-Speech (POS) tagging task. Our approach is
based on an incremental knowledge acquisition method where rules are stored in
an exception structure and new rules are only added to correct the errors of
existing rules; thus allowing systematic control of the interaction between the
rules. Experimental results on 13 languages show that our approach is fast in
terms of training time and tagging speed. Furthermore, our approach obtains
very competitive accuracy in comparison to state-of-the-art POS and
morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the
European Journal on Artificial Intelligence. Version 3: Resubmitted after
major revisions. Version 4: Resubmitted after minor revisions. Version 5: to
appear in AI Communications (accepted for publication on 3/12/2015
Node Cardinality Estimation in the Internet of Things Using Privileged Feature Distillation
The Internet of Things (IoT) is emerging as a critical technology to connect
resource-constrained devices such as sensors and actuators as well as
appliances to the Internet. In this paper, we propose a novel methodology for
node cardinality estimation in wireless networks such as the IoT and
Radio-Frequency IDentification (RFID) systems, which uses the privileged
feature distillation (PFD) technique and works using a neural network with a
teacher-student model. The teacher is trained using both privileged and regular
features, and the student is trained with predictions from the teacher and
regular features. We propose node cardinality estimation algorithms based on
the PFD technique for homogeneous as well as heterogeneous wireless networks.
We show via extensive simulations that the proposed PFD based algorithms for
homogeneous as well as heterogeneous networks achieve much lower mean squared
errors in the computed node cardinality estimates than state-of-the-art
protocols proposed in prior work, while taking the same number of time slots
for executing the node cardinality estimation process as the latter protocols.Comment: 15 pages, 17 figures, journal pape
Distributed Wireless Algorithms for RFID Systems: Grouping Proofs and Cardinality Estimation
The breadth and depth of the use of Radio Frequency Identification (RFID) are becoming more substantial. RFID is a technology useful for identifying unique items through radio waves. We design algorithms on RFID-based systems for the Grouping Proof and Cardinality Estimation problems.
A grouping-proof protocol is evidence that a reader simultaneously scanned the RFID tags in a group. In many practical scenarios, grouping-proofs greatly expand the potential of RFID-based systems such as supply chain applications, simultaneous scanning of multiple forms of IDs in banks or airports, and government paperwork. The design of RFID grouping-proofs that provide optimal security, privacy, and efficiency is largely an open area, with challenging problems including robust privacy mechanisms, addressing completeness and incompleteness (missing tags), and allowing dynamic groups definitions. In this work we present three variations of grouping-proof protocols that implement our mechanisms to overcome these challenges.
Cardinality estimation is for the reader to determine the number of tags in its communication range. Speed and accuracy are important goals. Many practical applications need an accurate and anonymous estimation of the number of tagged objects. Examples include intelligent transportation and stadium management. We provide an optimal estimation algorithm template for cardinality estimation that works for a {0,1,e} channel, which extends to most estimators and ,possibly, a high resolution {0,1,...,k-1,e} channel
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
- …