43 research outputs found

    Ontology Pattern-Based Data Integration

    Get PDF
    Data integration is concerned with providing a unified access to data residing at multiple sources. Such a unified access is realized by having a global schema and a set of mappings between the global schema and the local schemas of each data source, which specify how user queries at the global schema can be translated into queries at the local schemas. Data sources are typically developed and maintained independently, and thus, highly heterogeneous. This causes difficulties in integration because of the lack of interoperability in the aspect of architecture, data format, as well as syntax and semantics of the data. This dissertation represents a study on how small, self-contained ontologies, called ontology design patterns, can be employed to provide semantic interoperability in a cross-repository data integration system. The idea of this so-called ontology pattern- based data integration is that a collection of ontology design patterns can act as the global schema that still contains sufficient semantics, but is also flexible and simple enough to be used by linked data providers. On the one side, this differs from existing ontology-based solutions, which are based on large, monolithic ontologies that provide very rich semantics, but enforce too restrictive ontological choices, hence are shunned by many data providers. On the other side, this also differs from the purely linked data based solutions, which do offer simplicity and flexibility in data publishing, but too little in terms of semantic interoperability. We demonstrate the feasibility of this idea through the actual development of a large scale data integration project involving seven ocean science data repositories from five institutions in the U.S. In addition, we make two contributions as part of this dissertation work, which also play crucial roles in the aforementioned data integration project. First, we develop a collection of more than a dozen ontology design patterns that capture the key notions in the ocean science occurring in the participating data repositories. These patterns contain axiomatization of the key notions and were developed with an intensive involvement from the domain experts. Modeling of the patterns was done in a systematic workflow to ensure modularity, reusability, and flexibility of the whole pattern collection. Second, we propose the so-called pattern views that allow data providers to publish their data in very simple intermediate schema and show that they can greatly assist data providers to publish their data without requiring a thorough understanding of the axiomatization of the patterns

    OWL and Rules

    Get PDF
    The relationship between the Web Ontology Language OWL and rule-based formalisms has been the subject of many discussions and research investigations, some of them controversial. From the many attempts to reconcile the two paradigms, we present some of the newest developments. More precisely, we show which kind of rules can be modeled in the current version of OWL, and we show how OWL can be extended to incorporate rules. We finally give references to a large body of work on rules and OWL

    Reducing Adversarial Vulnerability through Adaptive Training Batch Size

    Get PDF
    Neural networks possess an ability to generalize well to data distribution, to an extent that they are capable of fitting to a randomly labeled data. But they are also known to be extremely sensitive to adversarial examples. Batch Normalization (BatchNorm), very commonly part of deep learning architecture, has been found to increase adversarial vulnerability. Fixup Initialization (Fixup Init) has been shown as an alternative to BatchNorm, which can considerably strengthen the networks against adversarial examples. This robustness can be improved further by employing smaller batch size in training. The latter, however, comes with a tradeoff in the form of a significant increase of training time (up to ten times longer when reducing batch size from the default 128 to 8 for ResNet-56). In this paper, we propose a workaround to this problem by starting the training with a small batch size and gradually increase it to larger ones during training. We empirically show that our proposal can still improve adversarial robustness (up to 5.73\%) of ResNet-56 with Fixup Init and default batch size of 128. At the same time, our proposal keeps the training time considerably shorter (only 4 times longer, instead of 10 times)

    Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia

    Full text link
    Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using large language models for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.Comment: Accepted on SEALP 2023, Workshop in IJCNLP-AACL 202

    Ontology modeling with domain experts: The GeoVoCamp experience

    Get PDF
    Abstract. A series of GeoVoCamps, run at least twice a year in locations in the U.S., have focused on ontology design patterns as an approach to inform metadata and data models, and on applications in the GeoSciences. In this note, we will redraw the brief history of the series as well as rationales for the particular approach which was chosen, and report on the ongoing uptake of the approach

    An Ontology Pattern for Oceanographic Cruises: Towards an Oceanographer\u27s Dream of Integrated Knowledge Discovery

    Get PDF
    EarthCube is a major effort of the National Science Foundation to establish a next-generation knowledge architecture for the broader geosciences. Data storage, retrieval, access, and reuse are central parts of this new effort. Currently, EarthCube is organized around several building blocks and research coordination networks. OceanLink is a semantics enabled building block that aims at improving data retrieval and reuse via ontologies, Semantic Web technologies, and Linked Data for the ocean sciences. Cruises, in the sense of research expeditions, are central events for ocean scientists. Consequently, information about these cruises and the involved vessels has to be shared and made retrievable. For example, the ability to find cruises in the vicinity of physiographic features of interest, e.g., a hydrothermal vent field or a fracture zone, is of primary interest for oceanographers. In this paper, we use a design pattern-centric strategy to engineer ontologies for OceanLink. We provide a formal axiomatization of the introduced patterns and ontologies using the Web Ontology Language, explain design choices, discuss the re-usability of our models, and provide lessons learned for the future geo-ontologies

    Nominal Schemas for Integrating Rules and Ontologies

    Get PDF
    We propose a description-logic style extension of OWL DL, which includes DL-safe variable SWRL and seamlessly integrates datalog rules. Our language also sports a tractable fragment, which we call ELP 2, covering OWL EL, OWL RL, most of OWL QL, and variable restricted datalog
    corecore