80,754 research outputs found
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Knowledge Graph semantic enhancement of input data for improving AI
Intelligent systems designed using machine learning algorithms require a
large number of labeled data. Background knowledge provides complementary, real
world factual information that can augment the limited labeled data to train a
machine learning algorithm. The term Knowledge Graph (KG) is in vogue as for
many practical applications, it is convenient and useful to organize this
background knowledge in the form of a graph. Recent academic research and
implemented industrial intelligent systems have shown promising performance for
machine learning algorithms that combine training data with a knowledge graph.
In this article, we discuss the use of relevant KGs to enhance input data for
two applications that use machine learning -- recommendation and community
detection. The KG improves both accuracy and explainability
Moving Object Trajectories Meta-Model And Spatio-Temporal Queries
In this paper, a general moving object trajectories framework is put forward
to allow independent applications processing trajectories data benefit from a
high level of interoperability, information sharing as well as an efficient
answer for a wide range of complex trajectory queries. Our proposed meta-model
is based on ontology and event approach, incorporates existing presentations of
trajectory and integrates new patterns like space-time path to describe
activities in geographical space-time. We introduce recursive Region of
Interest concepts and deal mobile objects trajectories with diverse
spatio-temporal sampling protocols and different sensors available that
traditional data model alone are incapable for this purpose.Comment: International Journal of Database Management Systems (IJDMS) Vol.4,
No.2, April 201
From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org
Despite increasing interest in Syriac studies and growing digital
availability of Syriac texts, there is currently no up-to-date infrastructure
for discovering, identifying, classifying, and referencing works of Syriac
literature. The standard reference work (Baumstark's Geschichte) is over ninety
years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be
accessed only through disparate catalogues and databases. The present article
proposes a tentative data model for Syriaca.org's New Handbook of Syriac
Literature, an open-access digital publication that will serve as both an
authority file for Syriac works and a guide to accessing their manuscript
representations, editions, and translations. The authors hope that by
publishing a draft data model they can receive feedback and incorporate
suggestions into the next stage of the project.Comment: Part of special issue: Computer-Aided Processing of Intertextuality
in Ancient Languages. 15 pages, 4 figure
Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Entity Linking (EL) is the task of automatically identifying entity mentions
in a piece of text and resolving them to a corresponding entity in a reference
knowledge base like Wikipedia. There is a large number of EL tools available
for different types of documents and domains, yet EL remains a challenging task
where the lack of precision on particularly ambiguous mentions often spoils the
usefulness of automated disambiguation results in real applications. A priori
approximations of the difficulty to link a particular entity mention can
facilitate flagging of critical cases as part of semi-automated EL systems,
while detecting latent factors that affect the EL performance, like
corpus-specific features, can provide insights on how to improve a system based
on the special characteristics of the underlying corpus. In this paper, we
first introduce a consensus-based method to generate difficulty labels for
entity mentions on arbitrary corpora. The difficulty labels are then exploited
as training data for a supervised classification task able to predict the EL
difficulty of entity mentions using a variety of features. Experiments over a
corpus of news articles show that EL difficulty can be estimated with high
accuracy, revealing also latent features that affect EL performance. Finally,
evaluation results demonstrate the effectiveness of the proposed method to
inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
SLIM : Scalable Linkage of Mobility Data
We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup
- …