108 research outputs found
Relational Foundations For Functorial Data Migration
We study the data transformation capabilities associated with schemas that
are presented by directed multi-graphs and path equations. Unlike most
approaches which treat graph-based schemas as abbreviations for relational
schemas, we treat graph-based schemas as categories. A schema is a
finitely-presented category, and the collection of all -instances forms a
category, -inst. A functor between schemas and , which can be
generated from a visual mapping between graphs, induces three adjoint data
migration functors, -inst-inst, -inst -inst, and -inst -inst. We present an algebraic query
language FQL based on these functors, prove that FQL is closed under
composition, prove that FQL can be implemented with the
select-project-product-union relational algebra (SPCU) extended with a
key-generation operation, and prove that SPCU can be implemented with FQL
On word problems in equational theories
The Knuth-Bendix procedure for word problems in universal algebra is known to be very effective when it is applicable. However, the procedure will fail if it generates equations which cannot be oriented into rules (i.e. the system is not noetherian), or if it generates infinitely many rules (i.e. the system is not confluent). In 1980 Huet showed that even if the system is not confluent, the Knuth-Bendix procedure still yiels a demi-decision procedure for word problems, provided that every equation can be oriented. In this paper we show that even if there are non-orientable equations, the Knuth-Bendix procedure can still be modified into a reasonably efficient semi-decision procedure for word problems in equational theories. Thus, we have lifted the noetherian requirement in the Knuth-Bendix procedure. Several confluence results are also given in the paper together with some experiments. Our method can also be extended to more general theories. Comparison with related works is also given. The proof of completeness, which is an interesting subject by itself, employs a new proof technique which utilizes a notion of transfinite semantic trees which is designed for proving refutational completeness of theorem proving methods in general
Implementing an institutional repository for digital archive communities: Experiences from National Taiwan University
This paper presents an empirical study of expanding and extending DSpace digital repository system for an academic institution. National Taiwan University created a portal web site, the Digital Archives Resource Centre (DARC), to provide the digital archives communities with preservation base for their rich research materials, and to provide the public with a helpful information retrieval service. Several modifications and extensions for the DSpace system are made in order to integrate various database resources with different formats among university departments. In this paper, we will present our empirical case in which adjustments of the DSpace system are made in areas such as data submission, metadata mapping, digital rights management, user interface and visualization. Our discussion will be focused on the useful applications in which digital archives communities benefit from the adapted DSpace system in disseminating their contents and long-term preservation within NTU campus. We will also discuss the ways in which the users benefit from the new system for searching and using digital archives
Various criteria in the evaluation of biomedical named entity recognition
BACKGROUND: Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks. RESULTS: To analyze the difference between JNLPBA's and BioCreAtIvE's evaluation, we conduct Experiment 1 to evaluate the top four JNLPBA systems using BioCreAtIvE's classification scheme. We then compare them with the top four BioCreAtIvE systems. Among them, three systems participated in both tasks, and each has an F-score lower on JNLPBA than on BioCreAtIvE. In Experiment 2, we apply hypothesis testing and correlation coefficient to find alternatives to BioCreAtIvE's evaluation scheme. It shows that right-match and left-match criteria have no significant difference with BioCreAtIvE. In Experiment 3, we propose a customized relaxed-match criterion that uses right match and merges JNLPBA's five NE classes into two, which achieves an F-score of 81.5%. In Experiment 4, we evaluate a range of five matching criteria from loose to strict on the top JNLPBA system and examine the percentage of false negatives. Our experiment gives the relative change in precision, recall and F-score as matching criteria are relaxed. CONCLUSION: In many applications, biomedical NEs could have several acceptable tags, which might just differ in their left or right boundaries. However, most corpora annotate only one of them. In our experiment, we found that right match and left match can be appropriate alternatives to JNLPBA and BioCreAtIvE's matching criteria. In addition, our relaxed-match criterion demonstrates that users can define their own relaxed criteria that correspond more realistically to their application requirements
- …