152 research outputs found
Solving the intractable problem: optimal performance for worst case scenarios in XML twig pattern matching
In the history of databases, eXtensible Markup Language (XML) has been thought of as the standard format to store and exchange semi-structured data. With the advent of IoT, XML technologies can play an important role in addressing the issue of processing a massive amount of data generated from heterogeneous devices. As the number and complexity of such datasets increases there is a need for algorithms which are able to index and retrieve XML data efficiently even for complex queries. In this context twig pattern matching , finding all occurrences of a twig pattern query (TPQ), is a core operation in XML query processing. Until now holistic joins have been considered the state-of-the-art TPQ processing algorithms, but they fail to guarantee an optimal evaluation except at the expense of excessive storage costs which limit their scope in large datasets. In this article, we introduce a new approach which significantly outperforms earlier methods in terms of both the size of the intermediate storage and query running time. The approach presented here uses Child Prime Labels (Alsubai & North, 2018) to improve the filtering phase of bottom-up twig matching algorithms and a novel algorithm which avoids the use of stacks, thus improving TPQs processing efficiency. Several experiments were conducted on common benchmarks such as DBLP, XMark and TreeBank datasets to study the performance of the new approach. Multiple analyses on a range of twig pattern queries are presented to demonstrate the statistical significance of the improvements
Twig Pattern Search in XML Database
For current search engine, we got results ranked by popularity. However, the most popular topics are not always I want. Millions people have millions different favors. So, the main challenge is how to dig the information up from the tremendous database of Internet according to different people's favor.
In computer science, "favor" is pattern. We call it "Twig Pattern Search". Unlike index methods that split a query into several sub-queries, and then stick the results together to provide the final answers, twig pattern search uses tree structures as the master unit of query to avoid expensive join operations.
We present an efficient algorithm for tree mapping problem in XML database. Given a target tree T and a pattern tree Q, the algorithm can find all the embeddings of Q in T in O (|D||Q|) time, where D is the largest data stream associated with a node of Q.Master of Science in Applied Computer Scienc
Child Prime Label Approaches to Evaluate XML Structured Queries
The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets
SIQXC: Schema Independent Queryable XML Compression for Smartphones
The explosive growth of XML use over the last decade has led to a lot of research on how to best store and access it. This growth has resulted in XML being described as a de facto standard for storage and exchange of data over the web. However, XML has high redundancy because of its self-‐ describing nature making it verbose. The verbose nature of XML poses a storage problem. This has led to much research devoted to XML compression. It has become of more interest since the use of resource constrained devices is also on the rise. These devices are limited in storage space, processing power and also have finite energy. Therefore, these devices cannot cope with storing and processing large XML documents. XML queryable compression methods could be a solution but none of them has a query processor that runs on such devices. Currently, wireless connections are used to alleviate the problem but they have adverse effects on the battery life. They are therefore not a sustainable solution.
This thesis describes an attempt to address this problem by proposing a queryable compressor (SIQXC) with a query processor that runs in a resource constrained environment thereby lowering wireless connection dependency yet alleviating the storage problem. It applies a novel simple 2 tuple integer encoding system, clustering and gzip. SIQXC achieves an average compression ratio of 70% which is higher than most queryable XML compressors and also supports a wide range of XPATH operators making it competitive approach. It was tested through a practical implementation evaluated against the real data that is usually used for XML benchmarking. The evaluation covered the compression ratio, compression time and query evaluation accuracy and response time. SIQXC allows users to some extent locally store and manipulate the otherwise verbose XML on their Smartphones
An experimental study and evaluation of a new architecture for clinical decision support - integrating the openEHR specifications for the Electronic Health Record with Bayesian Networks
Healthcare informatics still lacks wide-scale adoption of intelligent decision
support methods, despite continuous increases in computing power and
methodological advances in scalable computation and machine learning, over
recent decades. The potential has long been recognised, as evidenced in the
literature of the domain, which is extensively reviewed.
The thesis identifies and explores key barriers to adoption of clinical decision
support, through computational experiments encompassing a number of technical
platforms. Building on previous research, it implements and tests a novel platform
architecture capable of processing and reasoning with clinical data. The key
components of this platform are the now widely implemented openEHR electronic
health record specifications and Bayesian Belief Networks.
Substantial software implementations are used to explore the integration of
these components, guided and supplemented by input from clinician experts and
using clinical data models derived in hospital settings at Moorfields Eye Hospital.
Data quality and quantity issues are highlighted. Insights thus gained are used to
design and build a novel graph-based representation and processing model for the
clinical data, based on the openEHR specifications. The approach can be
implemented using diverse modern database and platform technologies.
Computational experiments with the platform, using data from two clinical
domains – a preliminary study with published thyroid metabolism data and a
substantial study of cataract surgery – explore fundamental barriers that must be
overcome in intelligent healthcare systems developments for clinical settings. These
have often been neglected, or misunderstood as implementation procedures of
secondary importance. The results confirm that the methods developed have the
potential to overcome a number of these barriers.
The findings lead to proposals for improvements to the openEHR
specifications, in the context of machine learning applications, and in particular for
integrating them with Bayesian Networks. The thesis concludes with a roadmap for
future research, building on progress and findings to date
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
BAYESIAN APPROACHES TO HUMAN-ROBOT INTERACTION: FROM LANGUAGE GROUNDING TO ACTION LEARNING AND UNDERSTANDING
In human-robot interaction field, the robot is no longer considered as a tool but as a
partner, which supports the work of humans. Environments that feature the interaction
and collaboration of humans and robots present a number of challenges involving robot
learning and interactive capabilities. In order to operate in these environments, the robot
must not only be able to do, but also be able to interact and especially to \u201dunderstand\u201d.
This thesis proposes a unified probabilistic framework that allows a robot to develop
basic cognitive skills essential for collaboration. To this aim we embrace the idea of motor
simulation - well established in cognitive science and neuroscience - in which the robot
reenacts in simulation its own internal models used for physically performing action. This
particular view offers the possibility to unify apparently distinct cognitive phenomena such
as learning, interaction, understanding and dialogue, just to name a few. Ideas presented
here are corroborated by experimental results performed both in simulation and on a
humanoid robotic platform.
The first contribution in this direction is a robust Bayesian method to estimate (i.e.
learn) the parameters of internal models by observing other skilled actors performing
goal-directed actions. In addition to deriving a theoretically sound solution for the learning
problem, our approach establishes theoretical links between Bayesian inference and
gradient-based optimization methods. Using the expectation propagation (EP) algorithm,
a similar algorithm is derived for multiple internal models scenario.
Once learned, internal models are reused in simulation to \u201dunderstand\u201d actions performed
by other actors, which is a necessary precondition for successful interaction. We
have proposed that action understanding can be cast as an approximate Bayesian inference
in which the covert activity of internal models produces hypotheses that are tested
in parallel through a sequential Monte Carlo approach. Here, approximate Bayesian inference
is offered as a plausible mechanistic implementation of the idea of motor simulation
making it feasible in real-time and with limited resources.
Finally, we have investigated how the robot can learn a grounded language model
in order to be bootstrapped into communication. Features extracted from the learned
internal models, as well as descriptors of various perceptual categories, are fed into a novel
multi-instance semi-supervised learning algorithm able to perform semantic clustering and
associate words, either nouns or verbs, with their grounded meaning
KINE[SIS]TEM'17 From Nature to Architectural Matter
Kine[SiS]tem – From Kinesis + System. Kinesis is a non-linear movement or activity of an organism in response to a stimulus. A system is a set of interacting and interdependent agents forming a complex whole, delineated by its spatial and temporal boundaries, influenced by its environment.
How can architectural systems moderate the external environment to enhance comfort conditions in a simple, sustainable and smart way?
This is the starting question for the Kine[SiS]tem’17 – From Nature to Architectural Matter International Conference. For decades, architectural design was developed despite (and not with) the climate, based on mechanical heating and cooling. Today, the argument for net zero energy buildings needs very effective strategies to reduce energy requirements. The challenge ahead requires design processes that are built upon consolidated knowledge, make use of advanced technologies and are inspired by nature. These design processes should lead to responsive smart systems that deliver the best performance in each specific design scenario.
To control solar radiation is one key factor in low-energy thermal comfort. Computational-controlled sensor-based kinetic surfaces are one of the possible answers to control solar energy in an effective way, within the scope of contradictory objectives throughout the year.FC
- …