30,118 research outputs found

    Probabilistic Relational Model Benchmark Generation

    Get PDF
    The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relational data mining context and enable effective and robust reasoning over relational data. Even though a panoply of works have focused, separately , on the generation of random Bayesian networks and relational databases, no work has been identified for PRMs on that track. This paper provides an algorithmic approach for generating random PRMs from scratch to fill this gap. The proposed method allows to generate PRMs as well as synthetic relational data from a randomly generated relational schema and a random set of probabilistic dependencies. This can be of interest not only for machine learning researchers to evaluate their proposals in a common framework, but also for databases designers to evaluate the effectiveness of the components of a database management system

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    A Method for Mapping XML DTD to Relational Schemas In The Presence Of Functional Dependencies

    Get PDF
    The eXtensible Markup Language (XML) has recently emerged as a standard for data representation and interchange on the web. As a lot of XML data in the web, now the pressure is to manage the data efficiently. Given the fact that relational databases are the most widely used technology for managing and storing XML, therefore XML needs to map to relations and this process is one that occurs frequently. There are many different ways to map and many approaches exist in the literature especially considering the flexible nesting structures that XML allows. This gives rise to the following important problem: Are some mappings ‘better’ than the others? To approach this problem, the classical relational database design through normalization technique that based on known functional dependency concept is referred. This concept is used to specify the constraints that may exist in the relations and guide the design while removing semantic data redundancies. This approach leads to a good normalized relational schema without data redundancy. To achieve a good normalized relational schema for XML, there is a need to extend the concept of functional dependency in relations to XML and use this concept as guidance for the design. Even though there exist functional dependency definitions for XML, but these definitions are not standard yet and still having several limitation. Due to the limitations of the existing definitions, constraints in the presence of shared and local elements that exist in XML document cannot be specified. In this study a new definition of functional dependency constraints for XML is proposed that are general enough to specify constraints and to discover semantic redundancies in XML documents. The focus of this study is on how to produce an optimal mapping approach in the presence of XML functional dependencies (XFD), keys and Data Type Definition (DTD) constraints, as a guidance to generate a good relational schema. To approach the mapping problem, three different components are explored: the mapping algorithm, functional dependency for XML, and implication process. The study of XML implication is important to imply what other dependencies that are guaranteed to hold in a relational representation of XML, given that a set of functional dependencies holds in the XML document. This leads to the needs of deriving a set of inference rules for the implication process. In the presence of DTD and userdefined XFD, other set of XFDs that are guaranteed to hold in XML can be generated using the set of inference rules. This mapping algorithm has been developed within the tool called XtoR. The quality of the mapping approach has been analyzed, and the result shows that the mapping approach (XtoR) significantly improve in terms of generating a good relational schema for XML with respect to reduce data and relation redundancy, remove dangling relations and remove association problems. The findings suggest that if one wants to use RDBMS to manage XML data, the mapping from XML document to relations must based be on functional dependency constraints

    UML Class Diagram or Entity Relationship Diagram : An Object Relational Impedance Mismatch

    Get PDF
    It is now nearly 30 years since Peter Chen’s watershed paper “The Entity-Relationship Model –towards a Unified View of Data”. [1] The entity relationship model and variations and extensions to ithave been taught in colleges and universities for many years. In his original paper Peter Chen looked at converting his new ER model to the then existing data structure diagrams for the Network model. In recent years there has been a tendency to use a Unified Modelling Language (UML) class diagram forconceptual modeling for relational databases, and several popular course text books use UMLnotation to some degree [2] [3]. However Object and Relational technology are based on different paradigms. In the paper we argue that the UML class diagram is more of a logical model (implementation specific). ER Diagrams on theother hand, are at a conceptual level of database design dealing with the main items and their relationships and not with implementation specific detail. UML focuses on OOAD (Object Oriented Analysis and Design) and is navigational and program dependent whereas the relational model is set based and exhibits data independence. The ER model provides a well-established set of mapping rules for mapping to a relational model. In this paper we look specifically at the areas which can cause problems for the novice databasedesigner due to this conceptual mismatch of two different paradigms. Firstly, transferring the mapping of a weak entity from an Entity Relationship model to UML and secondly the representation of structural constraints between objects. We look at the mixture of notations which students mistakenly use when modeling. This is often the result of different notations being used on different courses throughout their degree. Several of the popular text books at the moment use either a variation of ER,UML, or both for teaching database modeling. At the moment if a student picks up a text book they could be faced with either; one of the many ER variations, UML, UML and a variation of ER both covered separately, or UML and ER merged together. We regard this problem as a conceptual impedance mismatch. This problem is documented in [21] who have produced a catalogue of impedance mismatch problems between object-relational and relational paradigms. We regard the problems of using UML class diagrams for relational database design as a conceptual impedance mismatch as the Entity Relationship model does not have the structures in the model to deal with Object Oriented concepts Keywords: EERD, UML Class Diagram, Relational Database Design, Structural Constraints, relational and object database impedance mismatch. The ER model was originally put forward by Chen [1] and subsequently extensions have been added to add further semantics to the original model; mainly the concepts of specialisation, generalisation and aggregation. In this paper we refer to an Entity-Relationship model (ER) as the basic model and an extended or enhanced entity-relationship model (EER) as a model which includes the extra concepts. The ER and EER models are also often used to aid communication between the designer and the user at the requirements analysis stage. In this paper when we use the term “conceptual model” we mean a model that is not implementation specific.ISBN: 978-84-616-3847-5 3594Peer reviewe

    Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

    Full text link
    This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
    corecore