20,616 research outputs found
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
QCD Spin Physics: Partonic Spin Structure of the Nucleon
We discuss some recent developments concerning the nucleon's helicity parton
distribution functions: New preliminary data from jet production at RHIC
suggest for the first time a non-vanishing polarization of gluons in the
nucleon. SIDIS measurements at COMPASS provide better constraints on the
strange and light sea quark helicity distributions. Single-longitudinal spin
asymmetries in W-boson production have been observed at RHIC and will
ultimately give new insights into the light quark and anti-quark helicity
structure of the nucleon.Comment: Talk presented at the "International School of Nuclear Physics, 33rd
Course: From Quarks and Gluons to Hadrons and Nuclei", Erice, Italy, 16 - 24
September 2011; 12 pages, 9 figure
Supervised Typing of Big Graphs using Semantic Embeddings
We propose a supervised algorithm for generating type embeddings in the same
semantic vector space as a given set of entity embeddings. The algorithm is
agnostic to the derivation of the underlying entity embeddings. It does not
require any manual feature engineering, generalizes well to hundreds of types
and achieves near-linear scaling on Big Graphs containing many millions of
triples and instances by virtue of an incremental execution. We demonstrate the
utility of the embeddings on a type recommendation task, outperforming a
non-parametric feature-agnostic baseline while achieving 15x speedup and
near-constant memory usage on a full partition of DBpedia. Using
state-of-the-art visualization, we illustrate the agreement of our
extensionally derived DBpedia type embeddings with the manually curated domain
ontology. Finally, we use the embeddings to probabilistically cluster about 4
million DBpedia instances into 415 types in the DBpedia ontology.Comment: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD
2017; extended version in preparation for Open Journal of Semantic Web (OJSW
A hyperboloidal study of tail decay rates for scalar and Yang-Mills fields
We investigate the asymptotic behavior of spherically symmetric solutions to
scalar wave and Yang-Mills equations on a Schwarzschild background. The studies
demonstrate the astrophysical relevance of null infinity in predicting
radiation signals for gravitational wave detectors and show how test fields on
unbounded domains in black hole spacetimes can be simulated conveniently by
numerically solving hyperboloidal initial value problems.Comment: 8 pages, 7 figure
- …