20,616 research outputs found

    Information Extraction in Illicit Domains

    Full text link
    Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201

    QCD Spin Physics: Partonic Spin Structure of the Nucleon

    Full text link
    We discuss some recent developments concerning the nucleon's helicity parton distribution functions: New preliminary data from jet production at RHIC suggest for the first time a non-vanishing polarization of gluons in the nucleon. SIDIS measurements at COMPASS provide better constraints on the strange and light sea quark helicity distributions. Single-longitudinal spin asymmetries in W-boson production have been observed at RHIC and will ultimately give new insights into the light quark and anti-quark helicity structure of the nucleon.Comment: Talk presented at the "International School of Nuclear Physics, 33rd Course: From Quarks and Gluons to Hadrons and Nuclei", Erice, Italy, 16 - 24 September 2011; 12 pages, 9 figure

    Supervised Typing of Big Graphs using Semantic Embeddings

    Full text link
    We propose a supervised algorithm for generating type embeddings in the same semantic vector space as a given set of entity embeddings. The algorithm is agnostic to the derivation of the underlying entity embeddings. It does not require any manual feature engineering, generalizes well to hundreds of types and achieves near-linear scaling on Big Graphs containing many millions of triples and instances by virtue of an incremental execution. We demonstrate the utility of the embeddings on a type recommendation task, outperforming a non-parametric feature-agnostic baseline while achieving 15x speedup and near-constant memory usage on a full partition of DBpedia. Using state-of-the-art visualization, we illustrate the agreement of our extensionally derived DBpedia type embeddings with the manually curated domain ontology. Finally, we use the embeddings to probabilistically cluster about 4 million DBpedia instances into 415 types in the DBpedia ontology.Comment: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD 2017; extended version in preparation for Open Journal of Semantic Web (OJSW

    A hyperboloidal study of tail decay rates for scalar and Yang-Mills fields

    Full text link
    We investigate the asymptotic behavior of spherically symmetric solutions to scalar wave and Yang-Mills equations on a Schwarzschild background. The studies demonstrate the astrophysical relevance of null infinity in predicting radiation signals for gravitational wave detectors and show how test fields on unbounded domains in black hole spacetimes can be simulated conveniently by numerically solving hyperboloidal initial value problems.Comment: 8 pages, 7 figure
    corecore