4,764 research outputs found
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference
(NLI) datasets that help provide insight into how well a sentence
representation captures distinct types of reasoning. The collection results
from recasting 13 existing datasets from 7 semantic phenomena into a common NLI
structure, resulting in over half a million labeled context-hypothesis pairs in
total. We refer to our collection as the DNC: Diverse Natural Language
Inference Collection. The DNC is available online at https://www.decomp.net,
and will grow over time as additional resources are recast and added from novel
sources.Comment: To be presented at EMNLP 2018. 15 page
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
Punny Captions: Witty Wordplay in Image Descriptions
Wit is a form of rich interaction that is often grounded in a specific
situation (e.g., a comment in response to an event). In this work, we attempt
to build computational models that can produce witty descriptions for a given
image. Inspired by a cognitive account of humor appreciation, we employ
linguistic wordplay, specifically puns, in image descriptions. We develop two
approaches which involve retrieving witty descriptions for a given image from a
large corpus of sentences, or generating them via an encoder-decoder neural
network architecture. We compare our approach against meaningful baseline
approaches via human studies and show substantial improvements. We find that
when a human is subject to similar constraints as the model regarding word
usage and style, people vote the image descriptions generated by our model to
be slightly wittier than human-written witty descriptions. Unsurprisingly,
humans are almost always wittier than the model when they are free to choose
the vocabulary, style, etc.Comment: NAACL 2018 (11 pages
Complex adaptive systems based data integration : theory and applications
Data Definition Languages (DDLs) have been created and used to represent data in programming languages and in database dictionaries. This representation includes descriptions in the form of data fields and relations in the form of a hierarchy, with the common exception of relational databases where relations are flat. Network computing created an environment that enables relatively easy and inexpensive exchange of data. What followed was the creation of new DDLs claiming better support for automatic data integration. It is uncertain from the literature if any real progress has been made toward achieving an ideal state or limit condition of automatic data integration. This research asserts that difficulties in accomplishing integration are indicative of socio-cultural systems in general and are caused by some measurable attributes common in DDLs. This research’s main contributions are: (1) a theory of data integration requirements to fully support automatic data integration from autonomous heterogeneous data sources; (2) the identification of measurable related abstract attributes (Variety, Tension, and Entropy); (3) the development of tools to measure them. The research uses a multi-theoretic lens to define and articulate these attributes and their measurements. The proposed theory is founded on the Law of Requisite Variety, Information Theory, Complex Adaptive Systems (CAS) theory, Sowa’s Meaning Preservation framework and Zipf distributions of words and meanings. Using the theory, the attributes, and their measures, this research proposes a framework for objectively evaluating the suitability of any data definition language with respect to degrees of automatic data integration.
This research uses thirteen data structures constructed with various DDLs from the 1960\u27s to date. No DDL examined (and therefore no DDL similar to those examined) is designed to satisfy the law of requisite variety. No DDL examined is designed to support CAS evolutionary processes that could result in fully automated integration of heterogeneous data sources. There is no significant difference in measures of Variety, Tension, and Entropy among DDLs investigated in this research. A direction to overcome the common limitations discovered in this research is suggested and tested by proposing GlossoMote, a theoretical mathematically sound description language that satisfies the data integration theory requirements. The DDL, named GlossoMote, is not merely a new syntax, it is a drastic departure from existing DDL constructs. The feasibility of the approach is demonstrated with a small scale experiment and evaluated using the proposed assessment framework and other means. The promising results require additional research to evaluate GlossoMote’s approach commercial use potential
- …