35 research outputs found
Recommended from our members
Grid-based semantic integration of heterogeneous data resources: Implementation on a HealthGrid
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.The semantic integration of geographically distributed and heterogeneous data
resources still remains a key challenge in Grid infrastructures. Today's
mainstream Grid technologies hold the promise to meet this challenge in a
systematic manner, making data applications more scalable and manageable. The
thesis conducts a thorough investigation of the problem, the state of the art, and
the related technologies, and proposes an Architecture for Semantic Integration of
Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a
simple mechanism for the interoperability of heterogeneous data sources in order
to extract or discover information regardless of their different semantics. The
constituent technologies of this architecture include Globus Toolkit (GT4) and
OGSA-DAI (Open Grid Service Architecture Data Integration and Access)
alongside other web services technologies such as XML (Extensive Markup
Language). To show this, the ASIDS architecture was implemented and tested in a
realistic setting by building an exemplar application prototype on a HealthGrid
(pilot implementation).
The study followed an empirical research methodology and was informed by
extensive literature surveys and a critical analysis of the relevant technologies and
their synergies. The two literature reviews, together with the analysis of the
technology background, have provided a good overview of the current Grid and
HealthGrid landscape, produced some valuable taxonomies, explored new paths
by integrating technologies, and more importantly illuminated the problem and
guided the research process towards a promising solution. Yet the primary
contribution of this research is an approach that uses contemporary Grid
technologies for integrating heterogeneous data resources that have semantically
different. data fields (attributes). It has been practically demonstrated (using a
prototype HealthGrid) that discovery in semantically integrated distributed data
sources can be feasible by using mainstream Grid technologies, which have been
shown to have some Significant advantages over non-Grid based approaches
Experimental Evaluation of Growing and Pruning Hyper Basis Function Neural Networks Trained with Extended Information Filter
In this paper we test Extended Information Filter (EIF) for sequential training of Hyper Basis Function Neural Networks with growing and pruning ability (HBF-GP). The HBF neuron allows different scaling of input dimensions to provide better generalization property when dealing with complex nonlinear problems in engineering practice. The main intuition behind HBF is in generalization of Gaussian type of neuron that applies Mahalanobis-like distance as a distance metrics between input training sample and prototype vector. We exploit concept of neuron’s significance and allow growing and pruning of HBF neurons during sequential learning process. From engineer’s perspective, EIF is attractive for training of neural networks because it allows a designer to have scarce initial knowledge of the system/problem. Extensive experimental study shows that HBF neural network trained with EIF achieves same prediction error and compactness of network topology when compared to EKF, but without the need to know initial state uncertainty, which is its main advantage over EKF
Schema matching in a peer-to-peer database system
Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network
Improving Salience Retention and Identification in the Automated Filtering of Event Log Messages
Event log messages are currently the only genuine interface through which computer systems
administrators can effectively monitor their systems and assemble a mental perception
of system state. The popularisation of the Internet and the accompanying meteoric
growth of business-critical systems has resulted in an overwhelming volume of event log
messages, channeled through mechanisms whose designers could not have envisaged the
scale of the problem. Messages regarding intrusion detection, hardware status, operating
system status changes, database tablespaces, and so on, are being produced at the rate
of many gigabytes per day for a significant computing environment.
Filtering technologies have not been able to keep up. Most messages go unnoticed; no
filtering whatsoever is performed on them, at least in part due to the difficulty of implementing
and maintaining an effective filtering solution. The most commonly-deployed
filtering alternatives rely on regular expressions to match pre-defi ned strings, with 100%
accuracy, which can then become ineffective as the code base for the software producing
the messages 'drifts' away from those strings. The exactness requirement means all possible
failure scenarios must be accurately anticipated and their events catered for with
regular expressions, in order to make full use of this technique.
Alternatives to regular expressions remain largely academic. Data mining, automated
corpus construction, and neural networks, to name the highest-profi le ones, only produce
probabilistic results and are either difficult or impossible to alter in any deterministic way.
Policies are therefore not supported under these alternatives.
This thesis explores a new architecture which utilises rich metadata in order to avoid the
burden of message interpretation. The metadata itself is based on an intention to improve
end-to-end communication and reduce ambiguity. A simple yet effective filtering scheme
is also presented which fi lters log messages through a short and easily-customisable set
of rules. With such an architecture, it is envisaged that systems administrators could
signi ficantly improve their awareness of their systems while avoiding many of the false-positives
and -negatives which plague today's fi ltering solutions