30,266 research outputs found
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
Entity Resolution (ER) aims to identify different descriptions in various
Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the
Variety, Volume and Veracity of entity descriptions published in the Web of
Data. To address them, we propose the MinoanER framework that simultaneously
fulfills full automation, support of highly heterogeneous entities, and massive
parallelization of the ER process. MinoanER leverages a token-based similarity
of entities to define a new metric that derives the similarity of neighboring
entities from the most important relations, as they are indicated only by
statistics. A composite blocking method is employed to capture different
sources of matching evidence from the content, neighbors, or names of entities.
The search space of candidate pairs for comparison is compactly abstracted by a
novel disjunctive blocking graph and processed by a non-iterative, massively
parallel matching algorithm that consists of four generic, schema-agnostic
matching rules that are quite robust with respect to their internal
configuration. We demonstrate that the effectiveness of MinoanER is comparable
to existing ER tools over real KBs exhibiting low Variety, but it outperforms
them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
End-to-End Entity Resolution for Big Data: A Survey
One of the most important tasks for improving data quality and the
reliability of data analytics results is Entity Resolution (ER). ER aims to
identify different descriptions that refer to the same real-world entity, and
remains a challenging problem. While previous works have studied specific
aspects of ER (and mostly in traditional settings), in this survey, we provide
for the first time an end-to-end view of modern ER workflows, and of the novel
aspects of entity indexing and matching methods in order to cope with more than
one of the Big Data characteristics simultaneously. We present the basic
concepts, processing steps and execution strategies that have been proposed by
different communities, i.e., database, semantic Web and machine learning, in
order to cope with the loose structuredness, extreme diversity, high speed and
large scale of entity descriptions used by real-world applications. Finally, we
provide a synthetic discussion of the existing approaches, and conclude with a
detailed presentation of open research directions
TZC: Efficient Inter-Process Communication for Robotics Middleware with Partial Serialization
Inter-process communication (IPC) is one of the core functions of modern
robotics middleware. We propose an efficient IPC technique called TZC (Towards
Zero-Copy). As a core component of TZC, we design a novel algorithm called
partial serialization. Our formulation can generate messages that can be
divided into two parts. During message transmission, one part is transmitted
through a socket and the other part uses shared memory. The part within shared
memory is never copied or serialized during its lifetime. We have integrated
TZC with ROS and ROS2 and find that TZC can be easily combined with current
open-source platforms. By using TZC, the overhead of IPC remains constant when
the message size grows. In particular, when the message size is 4MB (less than
the size of a full HD image), TZC can reduce the overhead of ROS IPC from tens
of milliseconds to hundreds of microseconds and can reduce the overhead of ROS2
IPC from hundreds of milliseconds to less than 1 millisecond. We also
demonstrate the benefits of TZC by integrating with TurtleBot2 that are used in
autonomous driving scenarios. We show that by using TZC, the braking distance
can be shortened by 16% than ROS
Self-Control in Cyberspace: Applying Dual Systems Theory to a Review of Digital Self-Control Tools
Many people struggle to control their use of digital devices. However, our
understanding of the design mechanisms that support user self-control remains
limited. In this paper, we make two contributions to HCI research in this
space: first, we analyse 367 apps and browser extensions from the Google Play,
Chrome Web, and Apple App stores to identify common core design features and
intervention strategies afforded by current tools for digital self-control.
Second, we adapt and apply an integrative dual systems model of self-regulation
as a framework for organising and evaluating the design features found. Our
analysis aims to help the design of better tools in two ways: (i) by
identifying how, through a well-established model of self-regulation, current
tools overlap and differ in how they support self-control; and (ii) by using
the model to reveal underexplored cognitive mechanisms that could aid the
design of new tools.Comment: 11.5 pages (excl. references), 6 figures, 1 tabl
SICStus MT - A Multithreaded Execution Environment for SICStus Prolog
The development of intelligent software agents and other
complex applications which continuously interact with their
environments has been one of the reasons why explicit concurrency has
become a necessity in a modern Prolog system today. Such applications
need to perform several tasks which may be very different with respect
to how they are implemented in Prolog. Performing these tasks
simultaneously is very tedious without language support.
This paper describes the design, implementation and evaluation of a
prototype multithreaded execution environment for SICStus Prolog. The
threads are dynamically managed using a small and compact set of
Prolog primitives implemented in a portable way, requiring almost no
support from the underlying operating system
- …