39,348 research outputs found
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach
Many algorithms in workflow scheduling and resource provisioning rely on the
performance estimation of tasks to produce a scheduling plan. A profiler that
is capable of modeling the execution of tasks and predicting their runtime
accurately, therefore, becomes an essential part of any Workflow Management
System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS)
platforms that use clouds for deploying scientific workflows, task runtime
prediction becomes more challenging because it requires the processing of a
significant amount of data in a near real-time scenario while dealing with the
performance variability of cloud resources. Hence, relying on methods such as
profiling tasks' execution data using basic statistical description (e.g.,
mean, standard deviation) or batch offline regression techniques to estimate
the runtime may not be suitable for such environments. In this paper, we
propose an online incremental learning approach to predict the runtime of tasks
in scientific workflows in clouds. To improve the performance of the
predictions, we harness fine-grained resources monitoring data in the form of
time-series records of CPU utilization, memory usage, and I/O activities that
are reflecting the unique characteristics of a task's execution. We compare our
solution to a state-of-the-art approach that exploits the resources monitoring
data based on regression machine learning technique. From our experiments, the
proposed strategy improves the performance, in terms of the error, up to
29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM
International Conference on Utility and Cloud Computin
Building with Drones: Accurate 3D Facade Reconstruction using MAVs
Automatic reconstruction of 3D models from images using multi-view
Structure-from-Motion methods has been one of the most fruitful outcomes of
computer vision. These advances combined with the growing popularity of Micro
Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools
ubiquitous for large number of Architecture, Engineering and Construction
applications among audiences, mostly unskilled in computer vision. However, to
obtain high-resolution and accurate reconstructions from a large-scale object
using SfM, there are many critical constraints on the quality of image data,
which often become sources of inaccuracy as the current 3D reconstruction
pipelines do not facilitate the users to determine the fidelity of input data
during the image acquisition. In this paper, we present and advocate a
closed-loop interactive approach that performs incremental reconstruction in
real-time and gives users an online feedback about the quality parameters like
Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We
also propose a novel multi-scale camera network design to prevent scene drift
caused by incremental map building, and release the first multi-scale image
sequence dataset as a benchmark. Further, we evaluate our system on real
outdoor scenes, and show that our interactive pipeline combined with a
multi-scale camera network approach provides compelling accuracy in multi-view
reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and
Automation (ICRA '15), Seattle, WA, US
- …