88 research outputs found

    Code Generation for Big Data Processing in the Web using WebAssembly

    Get PDF
    Traditional clusters for cloud computing are quite hard to configure and setup, and the number of cluster nodes is limited by the available hardware in the cluster. We hence envision the concept of a Browser Cloud: One just has to visit with his/her web browser a certain webpage in order to connect his/her computer to the Browser Cloud. In this way the setup of the Browser Cloud is much easier than those of traditional clouds. Furthermore, the Browser Cloud has a much larger number of potential nodes, as any computer running a browser may connect to and be integrated in the Browser Cloud. New challenges arise when setting up a cloud by web browsers: Data is processed within the browser, which requires to use the technologies offered by the browser for this purpose. The typically used JavaScript runtime environment may be too slow, because JavaScript is an interpreted language. Hence we investigate the possibilities for computing the work-intensive part of the query processing inside a virtual machine of the web browser. The technology WebAssemby for virtual machines is recently supported by all major browsers and promises high speedups in comparison with JavaScript. Recent approaches to efficient Big Data processing generate code for the data processing steps of queries. To run the generated code in a WebAssembly virtual machine, an online compiler is needed to generate the WebAssembly bytecode from the generated code. Hence our main contribution is an online compiler to WebAssembly bytecode especially developed to run in the web browser and for Big Data processing based on code generation of the processing steps. In our experiments, the runtimes of Big Data processing using JavaScript is compared with running WebAssembly technologies in the major web browsers

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    A cooperative framework for molecular biology database integration using image object selection.

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Bridging mouse and human anatomies; a knowledge-based approach to comparative anatomy for disease model phenotyping.

    Get PDF
    The laboratory mouse is the foremost mammalian model used for studying human diseases and is closely anatomically related to humans. Whilst knowledge about human anatomy has been collected throughout the history of mankind, the first comprehensive study of the mouse anatomy was published less than 60 years ago. This has been followed by the more recent publication of several books and resources on mouse anatomy. Nevertheless, to date, our understanding and knowledge of mouse anatomy is far from being at the same level as that of humans. In addition, the alignment between current mouse and human anatomy nomenclatures is far from being as developed as those existing between other species, such as domestic animals and humans. To close this gap, more in depth mouse anatomical research is needed and it will be necessary to extent and refine the current vocabulary of mouse anatomical terms

    Federated knowledge base debugging in DL-Lite A

    Full text link
    Due to the continuously growing amount of data the federation of different and distributed data sources gained increasing attention. In order to tackle the challenge of federating heterogeneous sources a variety of approaches has been proposed. Especially in the context of the Semantic Web the application of Description Logics is one of the preferred methods to model federated knowledge based on a well-defined syntax and semantics. However, the more data are available from heterogeneous sources, the higher the risk is of inconsistency – a serious obstacle for performing reasoning tasks and query answering over a federated knowledge base. Given a single knowledge base the process of knowledge base debugging comprising the identification and resolution of conflicting statements have been widely studied while the consideration of federated settings integrating a network of loosely coupled data sources (such as LOD sources) has mostly been neglected. In this thesis we tackle the challenging problem of debugging federated knowledge bases and focus on a lightweight Description Logic language, called DL-LiteA, that is aimed at applications requiring efficient and scalable reasoning. After introducing formal foundations such as Description Logics and Semantic Web technologies we clarify the motivating context of this work and discuss the general problem of information integration based on Description Logics. The main part of this thesis is subdivided into three subjects. First, we discuss the specific characteristics of federated knowledge bases and provide an appropriate approach for detecting and explaining contradictive statements in a federated DL-LiteA knowledge base. Second, we study the representation of the identified conflicts and their relationships as a conflict graph and propose an approach for repair generation based on majority voting and statistical evidences. Third, in order to provide an alternative way for handling inconsistency in federated DL-LiteA knowledge bases we propose an automated approach for assessing adequate trust values (i.e., probabilities) at different levels of granularity by leveraging probabilistic inference over a graphical model. In the last part of this thesis, we evaluate the previously developed algorithms against a set of large distributed LOD sources. In the course of discussing the experimental results, it turns out that the proposed approaches are sufficient, efficient and scalable with respect to real-world scenarios. Moreover, due to the exploitation of the federated structure in our algorithms it further becomes apparent that the number of identified wrong statements, the quality of the generated repair as well as the fineness of the assessed trust values profit from an increasing number of integrated sources
    • …
    corecore