63,722 research outputs found
Dynamic Integration of Evolving Distributed Databases using Services
This thesis investigates the integration of many separate existing heterogeneous and distributed databases which, due to organizational changes, must be merged and appear as one database. A solution to some database evolution problems is presented. It presents an Evolution Adaptive Service-Oriented Data Integration Architecture (EA-SODIA) to dynamically integrate heterogeneous and distributed source databases, aiming to minimize the cost of the maintenance caused by database evolution.
An algorithm, named Relational Schema Mapping by Views (RSMV), is designed to integrate source databases that are exposed as services into a pre-designed global schema that is in a data integrator service. Instead of producing hard-coded programs, views are built using relational algebra operations to eliminate the heterogeneities among the source databases. More importantly, the definitions of those views are represented and stored in the meta-database with some constraints to test their validity. Consequently, the method, called Evolution Detection, is then able to identify in the meta-database the views affected by evolutions and then modify them automatically.
An evaluation is presented using case study. Firstly, it is shown that most types of heterogeneity defined in this thesis can be eliminated by RSMV, except semantic conflict. Secondly, it presents that few manual modification on the system is required as long as the evolutions follow the rules. For only three types of database evolutions, human intervention is required and some existing views are discarded. Thirdly, the computational cost of the automatic modification shows a slow linear growth in the number of source database. Other characteristics addressed include EA-SODIAâ scalability, domain independence, autonomy of source databases, and potential of involving other data sources (e.g.XML). Finally, the descriptive comparison with other data integration approaches is presented. It shows that although other approaches may provide better performance of query processing in some circumstances, the service-oriented architecture provide better autonomy, flexibility and capability of evolution
A network approach for managing and processing big cancer data in clouds
Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data
State-of-the-art on evolution and reactivity
This report starts by, in Chapter 1, outlining aspects of querying and updating resources on
the Web and on the Semantic Web, including the development of query and update languages
to be carried out within the Rewerse project.
From this outline, it becomes clear that several existing research areas and topics are of
interest for this work in Rewerse. In the remainder of this report we further present state of
the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give
an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs;
in Chapter 4 event-condition-action rules, both in the context of active database systems and
in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks
Supporting Change-Aware Semantic Web Services
The Semantic Web is not only evolving into a provider of structured meaningful content and knowledge representation, but also into a provider of services. While most of these services support external users of the SW, we focus on a vital service within the SW â change management and adaptation. Change is a ubiquitous feature of the SW. In this paper, we propose a service architecture that embraces and utilises change to provide higher quality services. We introduce pilot implementations of two supporting services within this architecture
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
Public or private economies of knowledge: The economics of diffusion and appropriation of bioinformatics tools
The past three decades have witnessed a period of great turbulence in the economies of biological knowledge, during which there has been great uncertainty as to how and where boundaries could be drawn between public or private knowledge especially with regard to the explosive growth in biological databases and their related bioinformatic tools. This paper will focus on some of the key software tools developed in relation to bio-databases. It will argue that bioinformatic tools are particularly economically unstable, and that there is a continuing tension and competition between their public and private modes of production, appropriation, distribution, and use. The paper adopts an ?instituted economic process? approach, and in this paper will elaborate on processes of making knowledge public in the creation of ?public goods?. The question is one of continuously creating and sustaining new institutions of the commons. We believe this critical to an understanding of the division and interdependency between public and private economies of knowledge
From access and integration to mining of secure genomic data sets across the grid
The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS
- âŠ