818 research outputs found

    XPath-based information extraction

    Get PDF

    Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach

    Full text link
    Finding joinable tables in data lakes is key procedure in many applications such as data integration, data augmentation, data analysis, and data market. Traditional approaches that find equi-joinable tables are unable to deal with misspellings and different formats, nor do they capture any semantic joins. In this paper, we propose PEXESO, a framework for joinable table discovery in data lakes. We embed textual values as high-dimensional vectors and join columns under similarity predicates on high-dimensional vectors, hence to address the limitations of equi-join approaches and identify more meaningful results. To efficiently find joinable tables with similarity, we propose a block-and-verify method that utilizes pivot-based filtering. A partitioning technique is developed to cope with the case when the data lake is large and the index cannot fit in main memory. An experimental evaluation on real datasets shows that our solution identifies substantially more tables than equi-joins and outperforms other similarity-based options, and the join results are useful in data enrichment for machine learning tasks. The experiments also demonstrate the efficiency of the proposed method.Comment: Full version of paper in ICDE 202

    Structural knowledge learning from maps for supervised land cover/use classification: Application to the monitoring of land cover/use maps in French Guiana

    Get PDF
    International audienceThe number of satellites and sensors devoted to Earth observation has become increasingly elevated, delivering extensive data, especially images. At the same time, the access to such data and the tools needed to process them has considerably improved. In the presence of such data flow, we need automatic image interpretation methods, especially when it comes to the monitoring and prediction of environmental and societal changes in highly dynamic socio-environmental contexts. This could be accomplished via artificial intelligence. The concept described here relies on the induction of classification rules that explicitly take into account structural knowledge, using Aleph, an Inductive Logic Programming (ILP) system, combined with a multi-class classification procedure. This methodology was used to monitor changes in land cover/use of the French Guiana coastline. One hundred and fifty-eight classification rules were induced from 3 diachronic land cover/use maps including 38 classes. These rules were expressed in first order logic language, which makes them easily understandable by non-experts. A 10-fold cross-validation gave significant average values of 84.62%, 99.57% and 77.22% for classification accuracy, specificity and sensitivity, respectively. Our methodology could be beneficial to automatically classify new objects and to facilitate object-based classification procedures

    Dwelling on ontology - semantic reasoning over topographic maps

    Get PDF
    The thesis builds upon the hypothesis that the spatial arrangement of topographic features, such as buildings, roads and other land cover parcels, indicates how land is used. The aim is to make this kind of high-level semantic information explicit within topographic data. There is an increasing need to share and use data for a wider range of purposes, and to make data more definitive, intelligent and accessible. Unfortunately, we still encounter a gap between low-level data representations and high-level concepts that typify human qualitative spatial reasoning. The thesis adopts an ontological approach to bridge this gap and to derive functional information by using standard reasoning mechanisms offered by logic-based knowledge representation formalisms. It formulates a framework for the processes involved in interpreting land use information from topographic maps. Land use is a high-level abstract concept, but it is also an observable fact intimately tied to geography. By decomposing this relationship, the thesis correlates a one-to-one mapping between high-level conceptualisations established from human knowledge and real world entities represented in the data. Based on a middle-out approach, it develops a conceptual model that incrementally links different levels of detail, and thereby derives coarser, more meaningful descriptions from more detailed ones. The thesis verifies its proposed ideas by implementing an ontology describing the land use ‘residential area’ in the ontology editor ProtĂ©gĂ©. By asserting knowledge about high-level concepts such as types of dwellings, urban blocks and residential districts as well as individuals that link directly to topographic features stored in the database, the reasoner successfully infers instances of the defined classes. Despite current technological limitations, ontologies are a promising way forward in the manner we handle and integrate geographic data, especially with respect to how humans conceptualise geographic space

    On the role of Computational Logic in Data Science: representing, learning, reasoning, and explaining knowledge

    Get PDF
    In this thesis we discuss in what ways computational logic (CL) and data science (DS) can jointly contribute to the management of knowledge within the scope of modern and future artificial intelligence (AI), and how technically-sound software technologies can be realised along the path. An agent-oriented mindset permeates the whole discussion, by stressing pivotal role of autonomous agents in exploiting both means to reach higher degrees of intelligence. Accordingly, the goals of this thesis are manifold. First, we elicit the analogies and differences among CL and DS, hence looking for possible synergies and complementarities along 4 major knowledge-related dimensions, namely representation, acquisition (a.k.a. learning), inference (a.k.a. reasoning), and explanation. In this regard, we propose a conceptual framework through which bridges these disciplines can be described and designed. We then survey the current state of the art of AI technologies, w.r.t. their capability to support bridging CL and DS in practice. After detecting lacks and opportunities, we propose the notion of logic ecosystem as the new conceptual, architectural, and technological solution supporting the incremental integration of symbolic and sub-symbolic AI. Finally, we discuss how our notion of logic ecosys- tem can be reified into actual software technology and extended towards many DS-related directions

    On relational learning and discovery in social networks: a survey

    Get PDF
    The social networking scene has evolved tremendously over the years. It has grown in relational complexities that extend a vast presence onto popular social media platforms on the internet. With the advance of sentimental computing and social complexity, relationships which were once thought to be simple have now become multi-dimensional and widespread in the online scene. This explosion in the online social scene has attracted much research attention. The main aims of this work revolve around the knowledge discovery and datamining processes of these feature-rich relations. In this paper, we provide a survey of relational learning and discovery through popular social analysis of different structure types which are integral to applications within the emerging field of sentimental and affective computing. It is hoped that this contribution will add to the clarity of how social networks are analyzed with the latest groundbreaking methods and provide certain directions for future improvements

    Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

    Full text link
    Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.Comment: Accepted to NeurIPS 202
    • 

    corecore