30,118 research outputs found
Probabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation
process where benchmarks availability is essential. In this paper, we aim to
randomly generate relational database benchmarks that allow to check
probabilistic dependencies among the attributes. We are particularly interested
in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs)
to a relational data mining context and enable effective and robust reasoning
over relational data. Even though a panoply of works have focused, separately ,
on the generation of random Bayesian networks and relational databases, no work
has been identified for PRMs on that track. This paper provides an algorithmic
approach for generating random PRMs from scratch to fill this gap. The proposed
method allows to generate PRMs as well as synthetic relational data from a
randomly generated relational schema and a random set of probabilistic
dependencies. This can be of interest not only for machine learning researchers
to evaluate their proposals in a common framework, but also for databases
designers to evaluate the effectiveness of the components of a database
management system
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
A Method for Mapping XML DTD to Relational Schemas In The Presence Of Functional Dependencies
The eXtensible Markup Language (XML) has recently emerged as a standard for
data representation and interchange on the web. As a lot of XML data in the web,
now the pressure is to manage the data efficiently. Given the fact that relational
databases are the most widely used technology for managing and storing XML,
therefore XML needs to map to relations and this process is one that occurs
frequently. There are many different ways to map and many approaches exist in the
literature especially considering the flexible nesting structures that XML allows. This
gives rise to the following important problem: Are some mappings ‘better’ than the
others? To approach this problem, the classical relational database design through
normalization technique that based on known functional dependency concept is
referred. This concept is used to specify the constraints that may exist in the relations
and guide the design while removing semantic data redundancies. This approach
leads to a good normalized relational schema without data redundancy. To achieve a
good normalized relational schema for XML, there is a need to extend the concept of
functional dependency in relations to XML and use this concept as guidance for the
design. Even though there exist functional dependency definitions for XML, but these definitions are not standard yet and still having several limitation. Due to the
limitations of the existing definitions, constraints in the presence of shared and local
elements that exist in XML document cannot be specified. In this study a new
definition of functional dependency constraints for XML is proposed that are general
enough to specify constraints and to discover semantic redundancies in XML
documents.
The focus of this study is on how to produce an optimal mapping approach in the
presence of XML functional dependencies (XFD), keys and Data Type Definition
(DTD) constraints, as a guidance to generate a good relational schema. To approach
the mapping problem, three different components are explored: the mapping
algorithm, functional dependency for XML, and implication process. The study of
XML implication is important to imply what other dependencies that are guaranteed
to hold in a relational representation of XML, given that a set of functional
dependencies holds in the XML document. This leads to the needs of deriving a set
of inference rules for the implication process. In the presence of DTD and userdefined
XFD, other set of XFDs that are guaranteed to hold in XML can be
generated using the set of inference rules. This mapping algorithm has been
developed within the tool called XtoR. The quality of the mapping approach has
been analyzed, and the result shows that the mapping approach (XtoR) significantly
improve in terms of generating a good relational schema for XML with respect to
reduce data and relation redundancy, remove dangling relations and remove
association problems. The findings suggest that if one wants to use RDBMS to
manage XML data, the mapping from XML document to relations must based be on
functional dependency constraints
UML Class Diagram or Entity Relationship Diagram : An Object Relational Impedance Mismatch
It is now nearly 30 years since Peter Chen’s watershed paper “The Entity-Relationship Model –towards a Unified View of Data”. [1] The entity relationship model and variations and extensions to ithave been taught in colleges and universities for many years. In his original paper Peter Chen looked at converting his new ER model to the then existing data structure diagrams for the Network model. In recent years there has been a tendency to use a Unified Modelling Language (UML) class diagram forconceptual modeling for relational databases, and several popular course text books use UMLnotation to some degree [2] [3]. However Object and Relational technology are based on different paradigms. In the paper we argue that the UML class diagram is more of a logical model (implementation specific). ER Diagrams on theother hand, are at a conceptual level of database design dealing with the main items and their relationships and not with implementation specific detail. UML focuses on OOAD (Object Oriented Analysis and Design) and is navigational and program dependent whereas the relational model is set based and exhibits data independence. The ER model provides a well-established set of mapping rules for mapping to a relational model. In this paper we look specifically at the areas which can cause problems for the novice databasedesigner due to this conceptual mismatch of two different paradigms. Firstly, transferring the mapping of a weak entity from an Entity Relationship model to UML and secondly the representation of structural constraints between objects. We look at the mixture of notations which students mistakenly use when modeling. This is often the result of different notations being used on different courses throughout their degree. Several of the popular text books at the moment use either a variation of ER,UML, or both for teaching database modeling. At the moment if a student picks up a text book they could be faced with either; one of the many ER variations, UML, UML and a variation of ER both covered separately, or UML and ER merged together. We regard this problem as a conceptual impedance mismatch. This problem is documented in [21] who have produced a catalogue of impedance mismatch problems between object-relational and relational paradigms. We regard the problems of using UML class diagrams for relational database design as a conceptual impedance mismatch as the Entity Relationship model does not have the structures in the model to deal with Object Oriented concepts Keywords: EERD, UML Class Diagram, Relational Database Design, Structural Constraints, relational and object database impedance mismatch. The ER model was originally put forward by Chen [1] and subsequently extensions have been added to add further semantics to the original model; mainly the concepts of specialisation, generalisation and aggregation. In this paper we refer to an Entity-Relationship model (ER) as the basic model and an extended or enhanced entity-relationship model (EER) as a model which includes the extra concepts. The ER and EER models are also often used to aid communication between the designer and the user at the requirements analysis stage. In this paper when we use the term “conceptual model” we mean a model that is not implementation specific.ISBN: 978-84-616-3847-5 3594Peer reviewe
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
- …