45 research outputs found

    Ontology Pattern-Based Data Integration

    Get PDF
    Data integration is concerned with providing a unified access to data residing at multiple sources. Such a unified access is realized by having a global schema and a set of mappings between the global schema and the local schemas of each data source, which specify how user queries at the global schema can be translated into queries at the local schemas. Data sources are typically developed and maintained independently, and thus, highly heterogeneous. This causes difficulties in integration because of the lack of interoperability in the aspect of architecture, data format, as well as syntax and semantics of the data. This dissertation represents a study on how small, self-contained ontologies, called ontology design patterns, can be employed to provide semantic interoperability in a cross-repository data integration system. The idea of this so-called ontology pattern- based data integration is that a collection of ontology design patterns can act as the global schema that still contains sufficient semantics, but is also flexible and simple enough to be used by linked data providers. On the one side, this differs from existing ontology-based solutions, which are based on large, monolithic ontologies that provide very rich semantics, but enforce too restrictive ontological choices, hence are shunned by many data providers. On the other side, this also differs from the purely linked data based solutions, which do offer simplicity and flexibility in data publishing, but too little in terms of semantic interoperability. We demonstrate the feasibility of this idea through the actual development of a large scale data integration project involving seven ocean science data repositories from five institutions in the U.S. In addition, we make two contributions as part of this dissertation work, which also play crucial roles in the aforementioned data integration project. First, we develop a collection of more than a dozen ontology design patterns that capture the key notions in the ocean science occurring in the participating data repositories. These patterns contain axiomatization of the key notions and were developed with an intensive involvement from the domain experts. Modeling of the patterns was done in a systematic workflow to ensure modularity, reusability, and flexibility of the whole pattern collection. Second, we propose the so-called pattern views that allow data providers to publish their data in very simple intermediate schema and show that they can greatly assist data providers to publish their data without requiring a thorough understanding of the axiomatization of the patterns

    OWL and Rules

    Get PDF
    The relationship between the Web Ontology Language OWL and rule-based formalisms has been the subject of many discussions and research investigations, some of them controversial. From the many attempts to reconcile the two paradigms, we present some of the newest developments. More precisely, we show which kind of rules can be modeled in the current version of OWL, and we show how OWL can be extended to incorporate rules. We finally give references to a large body of work on rules and OWL

    LexID: The Metadata and Semantic Knowledge Graph Construction of Indonesian Legal Document

    Get PDF
    The Legal Fiction principle stipulates that the government needs to ensure the public availability of all of their legal documents. Unfortunately, the text-based search services they provide cannot return satisfactory answers in retrieval scenarios requiring proper representation of relationships between various legal documents. A key problem here is the lack of explicit representation of such relationships behind the employed retrieval engines. We aim to address this problem by proposing LexID knowledge graph (KG) that provides an explicit knowledge representation for Indonesian legal domain usable for such retrieval purposes. The KG contains both legal metadata information and semantic content of the legal clauses of the legal document's articles, modeled using formal vocabulary from the LexID ontology also presented in this paper. The KG is constructed from thousands of Indonesian legal documents. Since the procedure of writing a legal document regulated by the government is clear and detailed, we use a rule-based approach to construct our KG. At the end, we describe several use cases of the KG to address different retrieval needs. In Addition, we evaluated the quality of our KG by measuring its ability to answer questions and got that LexID can answer questions with the macro average F1 score is about 0.91

    Reducing Adversarial Vulnerability through Adaptive Training Batch Size

    Get PDF
    Neural networks possess an ability to generalize well to data distribution, to an extent that they are capable of fitting to a randomly labeled data. But they are also known to be extremely sensitive to adversarial examples. Batch Normalization (BatchNorm), very commonly part of deep learning architecture, has been found to increase adversarial vulnerability. Fixup Initialization (Fixup Init) has been shown as an alternative to BatchNorm, which can considerably strengthen the networks against adversarial examples. This robustness can be improved further by employing smaller batch size in training. The latter, however, comes with a tradeoff in the form of a significant increase of training time (up to ten times longer when reducing batch size from the default 128 to 8 for ResNet-56). In this paper, we propose a workaround to this problem by starting the training with a small batch size and gradually increase it to larger ones during training. We empirically show that our proposal can still improve adversarial robustness (up to 5.73\%) of ResNet-56 with Fixup Init and default batch size of 128. At the same time, our proposal keeps the training time considerably shorter (only 4 times longer, instead of 10 times)

    Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia

    Full text link
    Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using large language models for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.Comment: Accepted on SEALP 2023, Workshop in IJCNLP-AACL 202

    Entity and Relation Linking for Knowledge Graph Question Answering Using Gradual Searching

    Get PDF
    Knowledge graph question answering (KGQA) systems have an important role in retrieving data from a knowledge graph (KG). With the system, regular users can access data from a KG without the need to construct a formal SPARQL query. KGQA systems receive a natural language question (NLQ) and translate it into a SPARQL query through three main tasks, namely, entity and relation detection, entity and relation linking, and query construction. However, the translation is not trivial due to lexical gaps and entity ambiguity that may occur during entity or relation linking. This research proposed an approach based on multiclass classification of NLQ whose entity occurrences are detected into categories based on KG relations to address the lexical gap challenge. Next, to solve the entity ambiguity challenge, this research proposed a three-stage searching procedure to determine appropriate KG entities associated with the NLQ entities, given the correspondence between the NLQ and a particular KG relation. This three-stage searching consisted of text-based searching, vector-based searching, and entity and relation pairing. The proposed approach was evaluated on the SimpleQuestions and LC-QuAD 2.0 datasets. The experiments demonstrated that the proposed approach outperformed the state-of-the-art baseline. For the relation linking task, the proposed approach reached 89.87% and 74.83% recall for the SimpleQuestions and LC-QuAD 2.0, respectively. This approach also achieved 91.74% and 61.96% recall on the entity linking tasks for the SimpleQuestions and LC-QuAD 2.0, respectively

    Ontology modeling with domain experts: The GeoVoCamp experience

    Get PDF
    Abstract. A series of GeoVoCamps, run at least twice a year in locations in the U.S., have focused on ontology design patterns as an approach to inform metadata and data models, and on applications in the GeoSciences. In this note, we will redraw the brief history of the series as well as rationales for the particular approach which was chosen, and report on the ongoing uptake of the approach
    corecore