21 research outputs found
A Non-Parametric Learning Approach to Identify Online Human Trafficking
Human trafficking is among the most challenging law enforcement problems
which demands persistent fight against from all over the globe. In this study,
we leverage readily available data from the website "Backpage"-- used for
classified advertisement-- to discern potential patterns of human trafficking
activities which manifest online and identify most likely trafficking related
advertisements. Due to the lack of ground truth, we rely on two human analysts
--one human trafficking victim survivor and one from law enforcement, for
hand-labeling the small portion of the crawled data. We then present a
semi-supervised learning approach that is trained on the available labeled and
unlabeled data and evaluated on unseen data with further verification of
experts.Comment: Accepted in IEEE Intelligence and Security Informatics 2016
Conference (ISI 2016
Hotels-50K: A Global Hotel Recognition Dataset
Recognizing a hotel from an image of a hotel room is important for human
trafficking investigations. Images directly link victims to places and can help
verify where victims have been trafficked, and where their traffickers might
move them or others in the future. Recognizing the hotel from images is
challenging because of low image quality, uncommon camera perspectives, large
occlusions (often the victim), and the similarity of objects (e.g., furniture,
art, bedding) across different hotel rooms.
To support efforts towards this hotel recognition task, we have curated a
dataset of over 1 million annotated hotel room images from 50,000 hotels. These
images include professionally captured photographs from travel websites and
crowd-sourced images from a mobile application, which are more similar to the
types of images analyzed in real-world investigations. We present a baseline
approach based on a standard network architecture and a collection of
data-augmentation approaches tuned to this problem domain
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
MapSDI: A Scaled-up Semantic Data Integration Framework for Knowledge Graph Creation
Semantic web technologies have significantly contributed with effective
solutions for the problems of data integration and knowledge graph creation.
However, with the rapid growth of big data in diverse domains, different
interoperability issues still demand to be addressed, being scalability one of
the main challenges. In this paper, we address the problem of knowledge graph
creation at scale and provide MapSDI, a mapping rule-based framework for
optimizing semantic data integration into knowledge graphs. MapSDI allows for
the semantic enrichment of large-sized, heterogeneous, and potentially
low-quality data efficiently. The input of MapSDI is a set of data sources and
mapping rules being generated by a mapping language such as RML. First, MapSDI
pre-processes the sources based on semantic information extracted from mapping
rules, by performing basic database operators; it projects out required
attributes, eliminates duplicates, and selects relevant entries. All these
operators are defined based on the knowledge encoded by the mapping rules which
will be then used by the semantification engine (or RDFizer) to produce a
knowledge graph. We have empirically studied the impact of MapSDI on existing
RDFizers, and observed that knowledge graph creation time can be reduced on
average in one order of magnitude. It is also shown, theoretically, that the
sources and rules transformations provided by MapSDI are data-lossless
A Survey of Operations Research and Analytics Literature Related to Anti-Human Trafficking
Human trafficking is a compound social, economic, and human rights issue
occurring in all regions of the world. Understanding and addressing such a
complex crime requires effort from multiple domains and perspectives. As of
this writing, no systematic review exists of the Operations Research and
Analytics literature applied to the domain of human trafficking. The purpose of
this work is to fill this gap through a systematic literature review. Studies
matching our search criteria were found ranging from 2010 to March 2021. These
studies were gathered and analyzed to help answer the following three research
questions: (i) What aspects of human trafficking are being studied by
Operations Research and Analytics researchers? (ii) What Operations Research
and Analytics methods are being applied in the anti-human trafficking domain?
and (iii) What are the existing research gaps associated with (i) and (ii)? By
answering these questions, we illuminate the extent to which these topics have
been addressed in the literature, as well as inform future research
opportunities in applying analytical methods to advance the fight against human
trafficking.Comment: 28 pages, 6 Figures, 2 Table
LEAPME: Learning-based Property Matching with Embeddings
Data integration tasks such as the creation and extension of knowledge graphs
involve the fusion of heterogeneous entities from many sources. Matching and
fusion of such entities require to also match and combine their properties
(attributes). However, previous schema matching approaches mostly focus on two
sources only and often rely on simple similarity measurements. They thus face
problems in challenging use cases such as the integration of heterogeneous
product entities from many sources.
We therefore present a new machine learning-based property matching approach
called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes
numerous features of both property names and instance values. The approach
heavily makes use of word embeddings to better utilize the domain-specific
semantics of both property names and instance values. The use of supervised
machine learning helps exploit the predictive power of word embeddings.
Our comparative evaluation against five baselines for several multi-source
datasets with real-world data shows the high effectiveness of LEAPME. We also
show that our approach is even effective when training data from another domain
(transfer learning) is used