108 research outputs found

    Relational Foundations For Functorial Data Migration

    Full text link
    We study the data transformation capabilities associated with schemas that are presented by directed multi-graphs and path equations. Unlike most approaches which treat graph-based schemas as abbreviations for relational schemas, we treat graph-based schemas as categories. A schema SS is a finitely-presented category, and the collection of all SS-instances forms a category, SS-inst. A functor FF between schemas SS and TT, which can be generated from a visual mapping between graphs, induces three adjoint data migration functors, ΣF:S\Sigma_F:S-inst→T\to T-inst, ΠF:S\Pi_F: S-inst →T\to T-inst, and ΔF:T\Delta_F:T-inst →S\to S-inst. We present an algebraic query language FQL based on these functors, prove that FQL is closed under composition, prove that FQL can be implemented with the select-project-product-union relational algebra (SPCU) extended with a key-generation operation, and prove that SPCU can be implemented with FQL

    On word problems in equational theories

    Get PDF
    The Knuth-Bendix procedure for word problems in universal algebra is known to be very effective when it is applicable. However, the procedure will fail if it generates equations which cannot be oriented into rules (i.e. the system is not noetherian), or if it generates infinitely many rules (i.e. the system is not confluent). In 1980 Huet showed that even if the system is not confluent, the Knuth-Bendix procedure still yiels a demi-decision procedure for word problems, provided that every equation can be oriented. In this paper we show that even if there are non-orientable equations, the Knuth-Bendix procedure can still be modified into a reasonably efficient semi-decision procedure for word problems in equational theories. Thus, we have lifted the noetherian requirement in the Knuth-Bendix procedure. Several confluence results are also given in the paper together with some experiments. Our method can also be extended to more general theories. Comparison with related works is also given. The proof of completeness, which is an interesting subject by itself, employs a new proof technique which utilizes a notion of transfinite semantic trees which is designed for proving refutational completeness of theorem proving methods in general

    Implementing an institutional repository for digital archive communities: Experiences from National Taiwan University

    Get PDF
    This paper presents an empirical study of expanding and extending DSpace digital repository system for an academic institution. National Taiwan University created a portal web site, the Digital Archives Resource Centre (DARC), to provide the digital archives communities with preservation base for their rich research materials, and to provide the public with a helpful information retrieval service. Several modifications and extensions for the DSpace system are made in order to integrate various database resources with different formats among university departments. In this paper, we will present our empirical case in which adjustments of the DSpace system are made in areas such as data submission, metadata mapping, digital rights management, user interface and visualization. Our discussion will be focused on the useful applications in which digital archives communities benefit from the adapted DSpace system in disseminating their contents and long-term preservation within NTU campus. We will also discuss the ways in which the users benefit from the new system for searching and using digital archives

    Various criteria in the evaluation of biomedical named entity recognition

    Get PDF
    BACKGROUND: Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks. RESULTS: To analyze the difference between JNLPBA's and BioCreAtIvE's evaluation, we conduct Experiment 1 to evaluate the top four JNLPBA systems using BioCreAtIvE's classification scheme. We then compare them with the top four BioCreAtIvE systems. Among them, three systems participated in both tasks, and each has an F-score lower on JNLPBA than on BioCreAtIvE. In Experiment 2, we apply hypothesis testing and correlation coefficient to find alternatives to BioCreAtIvE's evaluation scheme. It shows that right-match and left-match criteria have no significant difference with BioCreAtIvE. In Experiment 3, we propose a customized relaxed-match criterion that uses right match and merges JNLPBA's five NE classes into two, which achieves an F-score of 81.5%. In Experiment 4, we evaluate a range of five matching criteria from loose to strict on the top JNLPBA system and examine the percentage of false negatives. Our experiment gives the relative change in precision, recall and F-score as matching criteria are relaxed. CONCLUSION: In many applications, biomedical NEs could have several acceptable tags, which might just differ in their left or right boundaries. However, most corpora annotate only one of them. In our experiment, we found that right match and left match can be appropriate alternatives to JNLPBA and BioCreAtIvE's matching criteria. In addition, our relaxed-match criterion demonstrates that users can define their own relaxed criteria that correspond more realistically to their application requirements

    Multidimensional interactive fine-grained image retrieval

    Full text link
    • …
    corecore