2,443 research outputs found
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
A growing interest in tasks involving language understanding by the NLP
community has led to the need for effective semantic parsing and inference.
Modern NLP systems use semantic representations that do not quite fulfill the
nuanced needs for language understanding: adequately modeling language
semantics, enabling general inferences, and being accurately recoverable. This
document describes underspecified logical forms (ULF) for Episodic Logic (EL),
which is an initial form for a semantic representation that balances these
needs. ULFs fully resolve the semantic type structure while leaving issues such
as quantifier scope, word sense, and anaphora unresolved; they provide a
starting point for further resolution into EL, and enable certain structural
inferences without further resolution. This document also presents preliminary
results of creating a hand-annotated corpus of ULFs for the purpose of training
a precise ULF parser, showing a three-person pairwise interannotator agreement
of 0.88 on confident annotations. We hypothesize that a divide-and-conquer
approach to semantic parsing starting with derivation of ULFs will lead to
semantic analyses that do justice to subtle aspects of linguistic meaning, and
will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on
Computational Semantics (IWCS 2019
Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports
Open-Source Projects and Libraries are being used in software development
while also bearing multiple security vulnerabilities. This use of third party
ecosystem creates a new kind of attack surface for a product in development. An
intelligent attacker can attack a product by exploiting one of the
vulnerabilities present in linked projects and libraries.
In this paper, we mine threat intelligence about open source projects and
libraries from bugs and issues reported on public code repositories. We also
track library and project dependencies for installed software on a client
machine. We represent and store this threat intelligence, along with the
software dependencies in a security knowledge graph. Security analysts and
developers can then query and receive alerts from the knowledge graph if any
threat intelligence is found about linked libraries and projects, utilized in
their products
- …