995 research outputs found

    On Fine-Grained Access Control for XML

    Get PDF
    Fine-grained access control for XML is about controlling access to XML documents at the granularity of individual elements or attributes. This thesis addresses two problems related to XML access controls. The first is efficient, secure evaluation of XPath expressions. We present a technique that secures path expressions by means of query modification, and we show that the query modification algorithm is correct under a language-independent semantics for secure query evaluation. The second problem is to provide a compact, yet useful, representation of the access matrix. Since determining a user's privilege directly from access control policies can be extremely inefficient, materializing the access matrix---the net effect of the access control policies---is a common approach to speed up the authorization decision making. The fine-grained nature of XML access controls, however, makes the space cost of matrix materialization a significant issue. We present a codebook-based technique that records access matrices compactly. Our experimental study shows that the codebook approach exhibits significant space savings over other storage schemes, such as the access control list and the compressed accessibility map. The solutions to the above two problems provide a foundation for the development of an efficient mechanism that enforces fine-grained access controls for XML databases in the cases of query access

    Query Evaluation in the Presence of Fine-grained Access Control

    Get PDF
    Access controls are mechanisms to enhance security by protecting data from unauthorized accesses. In contrast to traditional access controls that grant access rights at the granularity of the whole tables or views, fine-grained access controls specify access controls at finer granularity, e.g., individual nodes in XML databases and individual tuples in relational databases. While there is a voluminous literature on specifying and modeling fine-grained access controls, less work has been done to address the performance issues of database systems with fine-grained access controls. This thesis addresses the performance issues of fine-grained access controls and proposes corresponding solutions. In particular, the following issues are addressed: effective storage of massive access controls, efficient query planning for secure query evaluation, and accurate cardinality estimation for access controlled data. Because fine-grained access controls specify access rights from each user to each piece of data in the system, they are effectively a massive matrix of the size as the product of the number of users and the size of data. Therefore, fine-grained access controls require a very compact encoding to be feasible. The proposed storage system in this thesis achieves an unprecedented level of compactness by leveraging the high correlation of access controls found in real system data. This correlation comes from two sides: the structural similarity of access rights between data, and the similarity of access patterns from different users. This encoding can be embedded into a linearized representation of XML data such that a query evaluation framework is able to compute the answer to the access controlled query with minimal disk I/O to the access controls. Query optimization is a crucial component for database systems. This thesis proposes an intelligent query plan caching mechanism that has lower amortized cost for query planning in the presence of fine-grained access controls. The rationale behind this query plan caching mechanism is that the queries, customized by different access controls from different users, may share common upper-level join trees in their optimal query plans. Since join plan generation is an expensive step in query optimization, reusing the upper-level join trees will reduce query optimization significantly. The proposed caching mechanism is able to match efficient query plans to access controlled query plans with minimal runtime cost. In case of a query plan cache miss, the optimizer needs to optimize an access controlled query from scratch. This depends on accurate cardinality estimation on the size of the intermediate query results. This thesis proposes a novel sampling scheme that has better accuracy than traditional cardinality estimation techniques

    Rule-based Methodologies for the Specification and Analysis of Complex Computing Systems

    Full text link
    Desde los orígenes del hardware y el software hasta la época actual, la complejidad de los sistemas de cálculo ha supuesto un problema al cual informáticos, ingenieros y programadores han tenido que enfrentarse. Como resultado de este esfuerzo han surgido y madurado importantes áreas de investigación. En esta disertación abordamos algunas de las líneas de investigación actuales relacionada con el análisis y la verificación de sistemas de computación complejos utilizando métodos formales y lenguajes de dominio específico. En esta tesis nos centramos en los sistemas distribuidos, con un especial interés por los sistemas Web y los sistemas biológicos. La primera parte de la tesis está dedicada a aspectos de seguridad y técnicas relacionadas, concretamente la certificación del software. En primer lugar estudiamos sistemas de control de acceso a recursos y proponemos un lenguaje para especificar políticas de control de acceso que están fuertemente asociadas a bases de conocimiento y que proporcionan una descripción sensible a la semántica de los recursos o elementos a los que se accede. También hemos desarrollado un marco novedoso de trabajo para la Code-Carrying Theory, una metodología para la certificación del software cuyo objetivo es asegurar el envío seguro de código en un entorno distribuido. Nuestro marco de trabajo está basado en un sistema de transformación de teorías de reescritura mediante operaciones de plegado/desplegado. La segunda parte de esta tesis se concentra en el análisis y la verificación de sistemas Web y sistemas biológicos. Proponemos un lenguaje para el filtrado de información que permite la recuperación de informaciones en grandes almacenes de datos. Dicho lenguaje utiliza información semántica obtenida a partir de ontologías remotas para re nar el proceso de filtrado. También estudiamos métodos de validación para comprobar la consistencia de contenidos web con respecto a propiedades sintácticas y semánticas. Otra de nuestras contribuciones es la propuesta de un lenguaje que permite definir y comprobar automáticamente restricciones semánticas y sintácticas en el contenido estático de un sistema Web. Finalmente, también consideramos los sistemas biológicos y nos centramos en un formalismo basado en lógica de reescritura para el modelado y el análisis de aspectos cuantitativos de los procesos biológicos. Para evaluar la efectividad de todas las metodologías propuestas, hemos prestado especial atención al desarrollo de prototipos que se han implementado utilizando lenguajes basados en reglas.Baggi ., M. (2010). Rule-based Methodologies for the Specification and Analysis of Complex Computing Systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8964Palanci

    A compact and scalable encoding for updating XML based on node labeling schemes

    Get PDF
    The eXtensible Markup Language (XML) has been adopted as the new standard for data exchange on the World Wide Web. As the rate of adoption increases, there is an ever pressing need to store, query and update XML in its native format, thereby eliminating the overhead of parsing and transforming XML in and out of various data formats. However, the hierarchical, ordered and semi-structured properties of the tree structure underlying the XML data model presents many challenges to updating XML. In particular, many of the tree labeling schemes were designed to solve a particular problem or provide a particular feature, often at the expense of other important features. In this dissertation, we identify the core properties that are representative of the desirable characteristics of a good dynamic labeling scheme for XML. We focus on four features central to the outstanding problems in existing dynamic labeling schemes; namely a compact label encoding, scalability, deleted node label reuse and a label storage scheme for binary-encoded bit-string node labels. At present there is no dynamic labeling scheme that integrates support for all four features. We present a novel compact and scalable adaptive encoding method to facilitate a highly constrained growth rate of label size under arbitrary node insertion and deletion scenarios and our encoding method can scale efficiently. We deploy our encoding method in two novel dynamic labeling schemes for XML that can completely avoid node relabeling, process frequently skewed insertions gracefully and reuse deleted node labels

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein

    Content Based Image Retrieval (CBIR) in Remote Clinical Diagnosis and Healthcare

    Full text link
    Content-Based Image Retrieval (CBIR) locates, retrieves and displays images alike to one given as a query, using a set of features. It demands accessible data in medical archives and from medical equipment, to infer meaning after some processing. A problem similar in some sense to the target image can aid clinicians. CBIR complements text-based retrieval and improves evidence-based diagnosis, administration, teaching, and research in healthcare. It facilitates visual/automatic diagnosis and decision-making in real-time remote consultation/screening, store-and-forward tests, home care assistance and overall patient surveillance. Metrics help comparing visual data and improve diagnostic. Specially designed architectures can benefit from the application scenario. CBIR use calls for file storage standardization, querying procedures, efficient image transmission, realistic databases, global availability, access simplicity, and Internet-based structures. This chapter recommends important and complex aspects required to handle visual content in healthcare.Comment: 28 pages, 6 figures, Book Chapter from "Encyclopedia of E-Health and Telemedicine

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

    Indexing collections of XML documents with arbitrary links

    Get PDF
    In recent years, the popularity of XML has increased significantly. XML is the extensible markup language of the World Wide Web Consortium (W3C). XML is used to represent data in many areas, such as traditional database management systems, e-business environments, and the World Wide Web. XML data, unlike relational and object-oriented data, has no fixed schema known in advance and is stored separately from the data. XML data is self-describing and can model heterogeneity more naturally than relational or object-oriented data models. Moreover, XML data usually has XLinks or XPointers to data in other documents (e.g., global-links). In addition to XLink or XPointer links, the XML standard allows to add internal-links between different elements in the same XML document using the ID/IDREF attributes. The rise in popularity of XML has generated much interest in query processing over graph-structured data. In order to facilitate efficient evaluation of path expressions, structured indexes have been proposed. However, most variants of structured indexes ignore global- or interior-document references. They assume a tree-like structure of XML-documents, which do not contain such global-and internal-links. Extending these indexes to work with large XML graphs considering of global- or internal-document links, firstly requires a lot of computing power for the creation process. Secondly, this would also require a great deal of space in which to store the indexes. As a latter demonstrates, the efficient evaluation of ancestors-descendants queries over arbitrary graphs with long paths is indeed a complex issue. This thesis proposes the HID index (2-Hop cover path Index based on DAG) is based on the concept of a two-hop cover for a directed graph. The algorithms proposed for the HID index creation, in effect, scales down the original graph size substantially. As a result, a directed acyclic graph (DAG) with a smaller number of nodes and edges will emerge. This reduces the number of computing steps required for building the index. In addition to this, computing time and space will be reduced as well. The index also permits to efficiently evaluate ancestors-descendants relationships. Moreover, the proposed index has an advantage over other comparable indexes: it is optimized for descendants- or-self queries on arbitrary graphs with link relationship, a task that would stress any index structures. Our experiments with real life XML data show that, the HID index provides better performance than other indexes
    corecore