Search CORE

88 research outputs found

Code Generation for Big Data Processing in the Web using WebAssembly

Author: Niklas Reimer
Sven Groppe
Publication venue: RonPub
Publication date: 01/01/2019
Field of study

Traditional clusters for cloud computing are quite hard to configure and setup, and the number of cluster nodes is limited by the available hardware in the cluster. We hence envision the concept of a Browser Cloud: One just has to visit with his/her web browser a certain webpage in order to connect his/her computer to the Browser Cloud. In this way the setup of the Browser Cloud is much easier than those of traditional clouds. Furthermore, the Browser Cloud has a much larger number of potential nodes, as any computer running a browser may connect to and be integrated in the Browser Cloud. New challenges arise when setting up a cloud by web browsers: Data is processed within the browser, which requires to use the technologies offered by the browser for this purpose. The typically used JavaScript runtime environment may be too slow, because JavaScript is an interpreted language. Hence we investigate the possibilities for computing the work-intensive part of the query processing inside a virtual machine of the web browser. The technology WebAssemby for virtual machines is recently supported by all major browsers and promises high speedups in comparison with JavaScript. Recent approaches to efficient Big Data processing generate code for the data processing steps of queries. To run the generated code in a WebAssembly virtual machine, an online compiler is needed to generate the WebAssembly bytecode from the generated code. Hence our main contribution is an online compiler to WebAssembly bytecode especially developed to run in the web browser and for Big Data processing based on code generation of the processing steps. In our experiments, the runtimes of Big Data processing using JavaScript is compared with running WebAssembly technologies in the major web browsers

RonPub -- Research Online Publishing

A cooperative framework for molecular biology database integration using image object selection

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

A cooperative framework for molecular biology database integration using image object selection.

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

iesnews; Esprit Information Exchange System Issue No. 18 October 1988

Author
Publication venue
Publication date: 01/01/1988
Field of study

Archive of European Integration

A cooperative framework for molecular biology database integration using image object selection

Author: Khan Nawaz
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Recommended from our members

A Rational Scheme for Conflict Detection and Resolution in Distributed Collaborative Environments for Enterprise Integration

Author: Grashoff H.
Publication venue
Publication date
Field of study

A typical enterprise may have large numbers of information sources such as data stores, expert systems, knowledge-based systems, or standard software systems. These may need to be integrated so that, for example, an application program or a decision maker can access information from all these sources. Such architectures are generally called 'Distributed Collaborative Environments for Enterprise Integration'. A general problem in these enterprise integration architectures is that information from heterogeneous, pre-existing sources may be obsolete, incomplete, incorrect or, for many other reasons, contradictory. Thus, conflicting results may occur when the same information is requested from semantically related sources. A mechanism is required to detect and resolve these conflicts in a way that is rational to any potential client of the integration environment. This thesis lays open the design of a general mechanism for conflict detection and resolution that enables intelligent information agents to reason about contradictory information from pre-existing, heterogeneous and autonomous sources. The mechanism's theoretical basis is a framework that is drawn from evidence law, which shares some fundamental commonalities with conflict detection and resolution in enterprise integration environments. Conflict detection opens with gathering the results collected by the information retrieval process. These results may have justifications or certainty assessments attached to them. Furthermore, it identifies whether and how these results are conflicting. The design of a conflict resolution mechanism is based on a rational scheme for judging the weight of conflicting results. First, the agents assess the reliability or credibility of an information source. Judgement based on the weight of conflicting results is first applied to any available, domain-specific, resolution strategies. Second, the agent applies any 'general scientific' resolution strategies that are not specific to one domain. When no domain-related expertise can solve the conflict then the agent can only judge on domain independent evaluation criteria such as the results' reliability. A scheme is sketched out for judgement based on the reliability of conflicting results, involving three steps: Ranking the conflicting results according to their reliability; Ways to redefine conflicting results; and Heuristic decision-making. The evaluation includes a computational implementation of an enterprise integration environment incorporating a model of an information agent. An example is realised in this environment. The conflict detection and resolution mechanism, and interfaces to each integrated source, are implemented in Visual C++. A case study is conducted on this scenario to evaluate each conflict detection and resolution step. Furthermore, this illustrates both the advantages over existing approaches and the limitations

City Research Online

Bridging mouse and human anatomies; a knowledge-based approach to comparative anatomy for disease model phenotyping.

Author: Carretero Ana
McKerlie Colin
Rodriguez-Baeza Alfonso
Ruberte Jesús
Schofield Paul N
Sundberg John P
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/09/2023
Field of study

The laboratory mouse is the foremost mammalian model used for studying human diseases and is closely anatomically related to humans. Whilst knowledge about human anatomy has been collected throughout the history of mankind, the first comprehensive study of the mouse anatomy was published less than 60 years ago. This has been followed by the more recent publication of several books and resources on mouse anatomy. Nevertheless, to date, our understanding and knowledge of mouse anatomy is far from being at the same level as that of humans. In addition, the alignment between current mouse and human anatomy nomenclatures is far from being as developed as those existing between other species, such as domestic animals and humans. To close this gap, more in depth mouse anatomical research is needed and it will be necessary to extent and refine the current vocabulary of mouse anatomical terms

The Jackson Laboratory: The Mouseion at the JAXlibrary

Federated knowledge base debugging in DL-Lite A

Author: Nolle Andreas
Publication venue
Publication date: 01/01/2021
Field of study

Due to the continuously growing amount of data the federation of different and distributed data sources gained increasing attention. In order to tackle the challenge of federating heterogeneous sources a variety of approaches has been proposed. Especially in the context of the Semantic Web the application of Description Logics is one of the preferred methods to model federated knowledge based on a well-defined syntax and semantics. However, the more data are available from heterogeneous sources, the higher the risk is of inconsistency – a serious obstacle for performing reasoning tasks and query answering over a federated knowledge base. Given a single knowledge base the process of knowledge base debugging comprising the identification and resolution of conflicting statements have been widely studied while the consideration of federated settings integrating a network of loosely coupled data sources (such as LOD sources) has mostly been neglected. In this thesis we tackle the challenging problem of debugging federated knowledge bases and focus on a lightweight Description Logic language, called DL-LiteA, that is aimed at applications requiring efficient and scalable reasoning. After introducing formal foundations such as Description Logics and Semantic Web technologies we clarify the motivating context of this work and discuss the general problem of information integration based on Description Logics. The main part of this thesis is subdivided into three subjects. First, we discuss the specific characteristics of federated knowledge bases and provide an appropriate approach for detecting and explaining contradictive statements in a federated DL-LiteA knowledge base. Second, we study the representation of the identified conflicts and their relationships as a conflict graph and propose an approach for repair generation based on majority voting and statistical evidences. Third, in order to provide an alternative way for handling inconsistency in federated DL-LiteA knowledge bases we propose an automated approach for assessing adequate trust values (i.e., probabilities) at different levels of granularity by leveraging probabilistic inference over a graphical model. In the last part of this thesis, we evaluate the previously developed algorithms against a set of large distributed LOD sources. In the course of discussing the experimental results, it turns out that the proposed approaches are sufficient, efficient and scalable with respect to real-world scenarios. Moreover, due to the exploitation of the federated structure in our algorithms it further becomes apparent that the number of identified wrong statements, the quality of the generated repair as well as the fineness of the assessed trust values profit from an increasing number of integrated sources

MAnnheim DOCument Server