158 research outputs found

    Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

    Get PDF
    This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio

    ShreX: managing XML documents in relational databases

    Get PDF
    Journal ArticleWe describe ShreX, a freely-available system for shredding, loading and querying XML documents in relational databases. ShreX supports all mapping strategies proposed in the literature as well as strategies available in commercial RDBMSs. It provides generic (mapping-independent) functions for loading shredded documents into relations and for translating XML queries into SQL. ShreX is portable and can be used with any relational database backend

    ARDI: automatic generation of RDFS models from heterogeneous data sources

    Get PDF
    The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.Peer ReviewedPostprint (author's final draft

    Automatic mapping of XML documents into relational database

    Get PDF
    Extensible Markup Language (XML) nowadays is one of the most important standard media used for exchanging and representing data through the Internet. Storing, updating and retrieving the huge amount of web services data such as XML is an attractive area of research for researchers and database vendors. In this thesis, we propose and develop a new mapping model, called MAXDOR, for storing, rebuilding, updating and querying XML documents using a relational database without making use of any XML schemas in the mapping process. The model addressed the problem of solving the structural hole between ordered hierarchical XML and unordered tabular relational database to enable us to use relational database systems for storing, updating and querying XML data. A multiple link list is used to maintain XML document structure, manage the process of updating document contents and retrieve document contents efficiently. Experiments are done to evaluate MAXDOR model. MAXDOR will be compared with other well-known models available in the literature(Tatarinov et al., 2002) and (Torsten et al., 2004) using total expected value of rebuilding XML document execution time and insertion of token execution time.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Accelerating data retrieval steps in XML documents

    Get PDF

    Complexity bounds for relational algebra over document spanners

    Get PDF
    We investigate the complexity of evaluating queries in Relational Algebra (RA) over the relations extracted by regex formulas (i.e., regular expressions with capture variables) over text documents. Such queries, also known as the regular document spanners, were shown to have an evaluation with polynomial delay for every positive RA expression (i.e., consisting of only natural joins, projections and unions); here, the RA expression is fixed and the input consists of both the regex formulas and the document. In this work, we explore the implication of two fundamental generalizations. The first is adopting the “schemaless” semantics for spanners, as proposed and studied by Maturana et al. The second is going beyond the positive RA to allowing the difference operator. We show that each of the two generalizations introduces computational hardness: it is intractable to compute the natural join of two regex formulas under the schemaless semantics, and the difference between two regex formulas under both the ordinary and schemaless semantics. Nevertheless, we propose and analyze syntactic constraints, on the RA expression and the regex formulas at hand, such that the expressive power is fully preserved and, yet, evaluation can be done with polynomial delay. Unlike the previous work on RA over regex formulas, our technique is not (and provably cannot be) based on the static compilation of regex formulas, but rather on an ad-hoc compilation into an automaton that incorporates both the query and the document. This approach also allows us to include black-box extractors in the RA expression

    Polymorphic Data Modeling

    Get PDF
    There are currently no data modeling standards for modeling NoSQL document store databases. This work proposes a standard to fill the void. The proposed standard is based on our new data modeling pattern named The Polymorphic Table Pattern. The pattern embraces the “schemaless” nature of document store NoSQL while allowing the data modeler to use his or her existing skillsets. The concepts of our proposed modeling have been demonstrated against MongoDB

    An Access Control Model for NoSQL Databases

    Get PDF
    Current development platforms are web scale, unlike recent platforms which were just network scale. There has been a rapid evolution in computing paradigm that has created the need for data storage as agile and scalable as the applications they support. Relational databases with their joins and locks influence performance in web scale systems negatively. Thus, various types of non-relational databases have emerged in recent years, commonly referred to as NoSQL databases. To fulfill the gaps created by their relational counter-part, they trade consistency and security for performance and scalability. With NoSQL databases being adopted by an increasing number of organizations, the provision of security for them has become a growing concern. This research presents a context based abstract model by extending traditional role based access control for access control in NoSQL databases. The said model evaluates and executes security policies which contain versatile access conditions against the dynamic nature of data. The goal is to devise a mechanism for a forward looking, assertive yet flexible security feature to regulate access to data in the database system that is devoid of rigid structures and consistency, namely a document based database such as MongoDB
    corecore