19,439 research outputs found
Automatic extraction of paraphrastic phrases from medium size corpora
This paper presents a versatile system intended to acquire paraphrastic
phrases from a representative corpus. In order to decrease the time spent on
the elaboration of resources for NLP system (for example Information
Extraction, IE hereafter), we suggest to use a machine learning system that
helps defining new templates and associated resources. This knowledge is
automatically derived from the text collection, in interaction with a large
semantic network
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Program Synthesis using Natural Language
Interacting with computers is a ubiquitous activity for millions of people.
Repetitive or specialized tasks often require creation of small, often one-off,
programs. End-users struggle with learning and using the myriad of
domain-specific languages (DSLs) to effectively accomplish these tasks.
We present a general framework for constructing program synthesizers that
take natural language (NL) inputs and produce expressions in a target DSL. The
framework takes as input a DSL definition and training data consisting of
NL/DSL pairs. From these it constructs a synthesizer by learning optimal
weights and classifiers (using NLP features) that rank the outputs of a
keyword-programming based translation. We applied our framework to three
domains: repetitive text editing, an intelligent tutoring system, and flight
information queries. On 1200+ English descriptions, the respective synthesizers
rank the desired program as the top-1 and top-3 for 80% and 90% descriptions
respectively
CRYSTAL: Inducing a Conceptual Dictionary
One of the central knowledge sources of an information extraction system is a
dictionary of linguistic patterns that can be used to identify the conceptual
content of a text. This paper describes CRYSTAL, a system which automatically
induces a dictionary of "concept-node definitions" sufficient to identify
relevant information from a training corpus. Each of these concept-node
definitions is generalized as far as possible without producing errors, so that
a minimum number of dictionary entries cover the positive training instances.
Because it tests the accuracy of each proposed definition, CRYSTAL can often
surpass human intuitions in creating reliable extraction rules.Comment: 6 pages, Postscript, IJCAI-95
http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.htm
Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval
In this paper we address the problem of learning robust cross-domain
representations for sketch-based image retrieval (SBIR). While most SBIR
approaches focus on extracting low- and mid-level descriptors for direct
feature matching, recent works have shown the benefit of learning coupled
feature representations to describe data from two related sources. However,
cross-domain representation learning methods are typically cast into non-convex
minimization problems that are difficult to optimize, leading to unsatisfactory
performance. Inspired by self-paced learning, a learning methodology designed
to overcome convergence issues related to local optima by exploiting the
samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced
partial curriculum learning (CPPCL) framework. Compared with existing
self-paced learning methods which only consider a single modality and cannot
deal with prior knowledge, CPPCL is specifically designed to assess the
learning pace by jointly handling data from dual sources and modality-specific
prior information provided in the form of partial curricula. Additionally,
thanks to the learned dictionaries, we demonstrate that the proposed CPPCL
embeds robust coupled representations for SBIR. Our approach is extensively
evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary
SBIR and TU-Berlin Extension datasets), showing superior performance over
competing SBIR methods
- …