152 research outputs found

    Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching

    Get PDF
    In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements

    Twig Pattern Search in XML Database

    Get PDF
    For current search engine, we got results ranked by popularity. However, the most popular topics are not always I want. Millions people have millions different favors. So, the main challenge is how to dig the information up from the tremendous database of Internet according to different people's favor. In computer science, "favor" is pattern. We call it "Twig Pattern Search". Unlike index methods that split a query into several sub-queries, and then stick the results together to provide the final answers, twig pattern search uses tree structures as the master unit of query to avoid expensive join operations. We present an efficient algorithm for tree mapping problem in XML database. Given a target tree T and a pattern tree Q, the algorithm can find all the embeddings of Q in T in O (|D||Q|) time, where D is the largest data stream associated with a node of Q.Master of Science in Applied Computer Scienc

    Probabilistic XML: Models and Complexity

    Full text link

    Child Prime Label Approaches to Evaluate XML Structured Queries

    Get PDF
    The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets

    SIQXC: Schema Independent Queryable XML Compression for Smartphones

    Get PDF
    The explosive growth of XML use over the last decade has led to a lot of research on how to best store and access it. This growth has resulted in XML being described as a de facto standard for storage and exchange of data over the web. However, XML has high redundancy because of its self-­‐ describing nature making it verbose. The verbose nature of XML poses a storage problem. This has led to much research devoted to XML compression. It has become of more interest since the use of resource constrained devices is also on the rise. These devices are limited in storage space, processing power and also have finite energy. Therefore, these devices cannot cope with storing and processing large XML documents. XML queryable compression methods could be a solution but none of them has a query processor that runs on such devices. Currently, wireless connections are used to alleviate the problem but they have adverse effects on the battery life. They are therefore not a sustainable solution. This thesis describes an attempt to address this problem by proposing a queryable compressor (SIQXC) with a query processor that runs in a resource constrained environment thereby lowering wireless connection dependency yet alleviating the storage problem. It applies a novel simple 2 tuple integer encoding system, clustering and gzip. SIQXC achieves an average compression ratio of 70% which is higher than most queryable XML compressors and also supports a wide range of XPATH operators making it competitive approach. It was tested through a practical implementation evaluated against the real data that is usually used for XML benchmarking. The evaluation covered the compression ratio, compression time and query evaluation accuracy and response time. SIQXC allows users to some extent locally store and manipulate the otherwise verbose XML on their Smartphones

    An experimental study and evaluation of a new architecture for clinical decision support - integrating the openEHR specifications for the Electronic Health Record with Bayesian Networks

    Get PDF
    Healthcare informatics still lacks wide-scale adoption of intelligent decision support methods, despite continuous increases in computing power and methodological advances in scalable computation and machine learning, over recent decades. The potential has long been recognised, as evidenced in the literature of the domain, which is extensively reviewed. The thesis identifies and explores key barriers to adoption of clinical decision support, through computational experiments encompassing a number of technical platforms. Building on previous research, it implements and tests a novel platform architecture capable of processing and reasoning with clinical data. The key components of this platform are the now widely implemented openEHR electronic health record specifications and Bayesian Belief Networks. Substantial software implementations are used to explore the integration of these components, guided and supplemented by input from clinician experts and using clinical data models derived in hospital settings at Moorfields Eye Hospital. Data quality and quantity issues are highlighted. Insights thus gained are used to design and build a novel graph-based representation and processing model for the clinical data, based on the openEHR specifications. The approach can be implemented using diverse modern database and platform technologies. Computational experiments with the platform, using data from two clinical domains – a preliminary study with published thyroid metabolism data and a substantial study of cataract surgery – explore fundamental barriers that must be overcome in intelligent healthcare systems developments for clinical settings. These have often been neglected, or misunderstood as implementation procedures of secondary importance. The results confirm that the methods developed have the potential to overcome a number of these barriers. The findings lead to proposals for improvements to the openEHR specifications, in the context of machine learning applications, and in particular for integrating them with Bayesian Networks. The thesis concludes with a roadmap for future research, building on progress and findings to date

    BAYESIAN APPROACHES TO HUMAN-ROBOT INTERACTION: FROM LANGUAGE GROUNDING TO ACTION LEARNING AND UNDERSTANDING

    Get PDF
    In human-robot interaction field, the robot is no longer considered as a tool but as a partner, which supports the work of humans. Environments that feature the interaction and collaboration of humans and robots present a number of challenges involving robot learning and interactive capabilities. In order to operate in these environments, the robot must not only be able to do, but also be able to interact and especially to \u201dunderstand\u201d. This thesis proposes a unified probabilistic framework that allows a robot to develop basic cognitive skills essential for collaboration. To this aim we embrace the idea of motor simulation - well established in cognitive science and neuroscience - in which the robot reenacts in simulation its own internal models used for physically performing action. This particular view offers the possibility to unify apparently distinct cognitive phenomena such as learning, interaction, understanding and dialogue, just to name a few. Ideas presented here are corroborated by experimental results performed both in simulation and on a humanoid robotic platform. The first contribution in this direction is a robust Bayesian method to estimate (i.e. learn) the parameters of internal models by observing other skilled actors performing goal-directed actions. In addition to deriving a theoretically sound solution for the learning problem, our approach establishes theoretical links between Bayesian inference and gradient-based optimization methods. Using the expectation propagation (EP) algorithm, a similar algorithm is derived for multiple internal models scenario. Once learned, internal models are reused in simulation to \u201dunderstand\u201d actions performed by other actors, which is a necessary precondition for successful interaction. We have proposed that action understanding can be cast as an approximate Bayesian inference in which the covert activity of internal models produces hypotheses that are tested in parallel through a sequential Monte Carlo approach. Here, approximate Bayesian inference is offered as a plausible mechanistic implementation of the idea of motor simulation making it feasible in real-time and with limited resources. Finally, we have investigated how the robot can learn a grounded language model in order to be bootstrapped into communication. Features extracted from the learned internal models, as well as descriptors of various perceptual categories, are fed into a novel multi-instance semi-supervised learning algorithm able to perform semantic clustering and associate words, either nouns or verbs, with their grounded meaning

    KINE[SIS]TEM'17 From Nature to Architectural Matter

    Get PDF
    Kine[SiS]tem – From Kinesis + System. Kinesis is a non-linear movement or activity of an organism in response to a stimulus. A system is a set of interacting and interdependent agents forming a complex whole, delineated by its spatial and temporal boundaries, influenced by its environment. How can architectural systems moderate the external environment to enhance comfort conditions in a simple, sustainable and smart way? This is the starting question for the Kine[SiS]tem’17 – From Nature to Architectural Matter International Conference. For decades, architectural design was developed despite (and not with) the climate, based on mechanical heating and cooling. Today, the argument for net zero energy buildings needs very effective strategies to reduce energy requirements. The challenge ahead requires design processes that are built upon consolidated knowledge, make use of advanced technologies and are inspired by nature. These design processes should lead to responsive smart systems that deliver the best performance in each specific design scenario. To control solar radiation is one key factor in low-energy thermal comfort. Computational-controlled sensor-based kinetic surfaces are one of the possible answers to control solar energy in an effective way, within the scope of contradictory objectives throughout the year.FC
    corecore