818 research outputs found
Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach
Finding joinable tables in data lakes is key procedure in many applications
such as data integration, data augmentation, data analysis, and data market.
Traditional approaches that find equi-joinable tables are unable to deal with
misspellings and different formats, nor do they capture any semantic joins. In
this paper, we propose PEXESO, a framework for joinable table discovery in data
lakes. We embed textual values as high-dimensional vectors and join columns
under similarity predicates on high-dimensional vectors, hence to address the
limitations of equi-join approaches and identify more meaningful results. To
efficiently find joinable tables with similarity, we propose a block-and-verify
method that utilizes pivot-based filtering. A partitioning technique is
developed to cope with the case when the data lake is large and the index
cannot fit in main memory. An experimental evaluation on real datasets shows
that our solution identifies substantially more tables than equi-joins and
outperforms other similarity-based options, and the join results are useful in
data enrichment for machine learning tasks. The experiments also demonstrate
the efficiency of the proposed method.Comment: Full version of paper in ICDE 202
Structural knowledge learning from maps for supervised land cover/use classification: Application to the monitoring of land cover/use maps in French Guiana
International audienceThe number of satellites and sensors devoted to Earth observation has become increasingly elevated, delivering extensive data, especially images. At the same time, the access to such data and the tools needed to process them has considerably improved. In the presence of such data flow, we need automatic image interpretation methods, especially when it comes to the monitoring and prediction of environmental and societal changes in highly dynamic socio-environmental contexts. This could be accomplished via artificial intelligence. The concept described here relies on the induction of classification rules that explicitly take into account structural knowledge, using Aleph, an Inductive Logic Programming (ILP) system, combined with a multi-class classification procedure. This methodology was used to monitor changes in land cover/use of the French Guiana coastline. One hundred and fifty-eight classification rules were induced from 3 diachronic land cover/use maps including 38 classes. These rules were expressed in first order logic language, which makes them easily understandable by non-experts. A 10-fold cross-validation gave significant average values of 84.62%, 99.57% and 77.22% for classification accuracy, specificity and sensitivity, respectively. Our methodology could be beneficial to automatically classify new objects and to facilitate object-based classification procedures
Dwelling on ontology - semantic reasoning over topographic maps
The thesis builds upon the hypothesis that the spatial arrangement of topographic
features, such as buildings, roads and other land cover parcels, indicates how land is
used. The aim is to make this kind of high-level semantic information explicit within
topographic data. There is an increasing need to share and use data for a wider range of
purposes, and to make data more definitive, intelligent and accessible. Unfortunately,
we still encounter a gap between low-level data representations and high-level concepts
that typify human qualitative spatial reasoning. The thesis adopts an ontological
approach to bridge this gap and to derive functional information by using standard
reasoning mechanisms offered by logic-based knowledge representation formalisms. It
formulates a framework for the processes involved in interpreting land use information
from topographic maps. Land use is a high-level abstract concept, but it is also an
observable fact intimately tied to geography. By decomposing this relationship, the
thesis correlates a one-to-one mapping between high-level conceptualisations
established from human knowledge and real world entities represented in the data.
Based on a middle-out approach, it develops a conceptual model that incrementally
links different levels of detail, and thereby derives coarser, more meaningful
descriptions from more detailed ones. The thesis verifies its proposed ideas by
implementing an ontology describing the land use âresidential areaâ in the ontology
editor Protégé. By asserting knowledge about high-level concepts such as types of
dwellings, urban blocks and residential districts as well as individuals that link directly
to topographic features stored in the database, the reasoner successfully infers instances
of the defined classes. Despite current technological limitations, ontologies are a
promising way forward in the manner we handle and integrate geographic data,
especially with respect to how humans conceptualise geographic space
On the role of Computational Logic in Data Science: representing, learning, reasoning, and explaining knowledge
In this thesis we discuss in what ways computational logic (CL) and data science (DS) can jointly contribute to the management of knowledge within the scope of modern and future artificial intelligence (AI), and how technically-sound software technologies can be realised along the path. An agent-oriented mindset permeates the whole discussion, by stressing pivotal role of autonomous agents in exploiting both means to reach higher degrees of intelligence. Accordingly, the goals of this thesis are manifold. First, we elicit the analogies and differences among CL and DS, hence looking for possible synergies and complementarities along 4 major knowledge-related dimensions, namely representation, acquisition (a.k.a. learning), inference (a.k.a. reasoning), and explanation. In this regard, we propose a conceptual framework through which bridges these disciplines can be described and designed. We then survey the current state of the art of AI technologies, w.r.t. their capability to support bridging CL and DS in practice. After detecting lacks and opportunities, we propose the notion of logic ecosystem as the new conceptual, architectural, and technological solution supporting the incremental integration of symbolic and sub-symbolic AI. Finally, we discuss how our notion of logic ecosys-
tem can be reified into actual software technology and extended towards many DS-related directions
On relational learning and discovery in social networks: a survey
The social networking scene has evolved tremendously over the years. It has grown in relational complexities that extend a vast presence onto popular social media platforms on the internet. With the advance of sentimental computing and social complexity, relationships which were once thought to be simple have now become multi-dimensional and widespread in the online scene. This explosion in the online social scene has attracted much research attention. The main aims of this work revolve around the knowledge discovery and datamining processes of these feature-rich relations. In this paper, we provide a survey of relational learning and discovery through popular social analysis of different structure types which are integral to applications within the emerging field of sentimental and affective computing. It is hoped that this contribution will add to the clarity of how social networks are analyzed with the latest groundbreaking methods and provide certain directions for future improvements
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Images contain rich relational knowledge that can help machines understand
the world. Existing methods on visual knowledge extraction often rely on the
pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation
types), restricting the expressiveness of the extracted knowledge. In this
work, we take a first exploration to a new paradigm of open visual knowledge
extraction. To achieve this, we present OpenVik which consists of an open
relational region detector to detect regions potentially containing relational
knowledge and a visual knowledge generator that generates format-free knowledge
by prompting the large multimodality model with the detected region of
interest. We also explore two data enhancement techniques for diversifying the
generated format-free visual knowledge. Extensive knowledge quality evaluations
highlight the correctness and uniqueness of the extracted open visual knowledge
by OpenVik. Moreover, integrating our extracted knowledge across various visual
reasoning applications shows consistent improvements, indicating the real-world
applicability of OpenVik.Comment: Accepted to NeurIPS 202
- âŠ