19 research outputs found

    Critical evaluation of the JDO API for the persistence and portability requirements of complex biological databases

    Get PDF
    BACKGROUND: Complex biological database systems have become key computational tools used daily by scientists and researchers. Many of these systems must be capable of executing on multiple different hardware and software configurations and are also often made available to users via the Internet. We have used the Java Data Object (JDO) persistence technology to develop the database layer of such a system known as the SigPath information management system. SigPath is an example of a complex biological database that needs to store various types of information connected by many relationships. RESULTS: Using this system as an example, we perform a critical evaluation of current JDO technology; discuss the suitability of the JDO standard to achieve portability, scalability and performance. We show that JDO supports portability of the SigPath system from a relational database backend to an object database backend and achieves acceptable scalability. To answer the performance question, we have created the SigPath JDO application benchmark that we distribute under the Gnu General Public License. This benchmark can be used as an example of using JDO technology to create a complex biological database and makes it possible for vendors and users of the technology to evaluate the performance of other JDO implementations for similar applications. CONCLUSIONS: The SigPath JDO benchmark and our discussion of JDO technology in the context of biological databases will be useful to bioinformaticians who design new complex biological databases and aim to create systems that can be ported easily to a variety of database backends

    SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

    Get PDF
    SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at:

    Building a protein name dictionary from full text: a machine learning term extraction approach

    Get PDF
    BACKGROUND: The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. RESULTS: We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. CONCLUSION: This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt

    The exploration of a category theory-based virtual Geometrical product specification system for design and manufacturing

    Get PDF
    In order to ensure quality of products and to facilitate global outsourcing, almost all the so-called “world-class” manufacturing companies nowadays are applying various tools and methods to maintain the consistency of a product’s characteristics throughout its manufacturing life cycle. Among these, for ensuring the consistency of the geometric characteristics, a tolerancing language − the Geometrical Product Specification (GPS) has been widely adopted to precisely transform the functional requirements from customers into manufactured workpieces expressed as tolerance notes in technical drawings. Although commonly acknowledged by industrial users as one of the most successful efforts in integrating existing manufacturing life-cycle standards, current GPS implementations and software packages suffer from several drawbacks in their practical use, possibly the most significant, the difficulties in inferring the data for the “best” solutions. The problem stemmed from the foundation of data structures and knowledge-based system design. This indicates that there need to be a “new” software system to facilitate GPS applications. The presented thesis introduced an innovative knowledge-based system − the VirtualGPS − that provides an integrated GPS knowledge platform based on a stable and efficient database structure with knowledge generation and accessing facilities. The system focuses on solving the intrinsic product design and production problems by acting as a virtual domain expert through translating GPS standards and rules into the forms of computerized expert advices and warnings. Furthermore, this system can be used as a training tool for young and new engineers to understand the huge amount of GPS standards in a relative “quicker” manner. The thesis started with a detailed discussion of the proposed categorical modelling mechanism, which has been devised based on the Category Theory. It provided a unified mechanism for knowledge acquisition and representation, knowledge-based system design, and database schema modelling. As a core part for assessing this knowledge-based system, the implementation of the categorical Database Management System (DBMS) is also presented in this thesis. The focus then moved on to demonstrate the design and implementation of the proposed VirtualGPS system. The tests and evaluations of this system were illustrated in Chapter 6. Finally, the thesis summarized the contributions to knowledge in Chapter 7. After thoroughly reviewing the project, the conclusions reached construe that the III entire VirtualGPS system was designed and implemented to conform to Category Theory and object-oriented programming rules. The initial tests and performance analyses show that the system facilitates the geometric product manufacturing operations and benefits the manufacturers and engineers alike from function designs, to a manufacturing and verification

    The exploration of a category theory-based virtual geometrical product specification system for design and manufacturing

    Get PDF
    In order to ensure quality of products and to facilitate global outsourcing, almost all the so-called “world-class” manufacturing companies nowadays are applying various tools and methods to maintain the consistency of a product’s characteristics throughout its manufacturing life cycle. Among these, for ensuring the consistency of the geometric characteristics, a tolerancing language − the Geometrical Product Specification (GPS) has been widely adopted to precisely transform the functional requirements from customers into manufactured workpieces expressed as tolerance notes in technical drawings. Although commonly acknowledged by industrial users as one of the most successful efforts in integrating existing manufacturing life-cycle standards, current GPS implementations and software packages suffer from several drawbacks in their practical use, possibly the most significant, the difficulties in inferring the data for the “best” solutions. The problem stemmed from the foundation of data structures and knowledge-based system design. This indicates that there need to be a “new” software system to facilitate GPS applications. The presented thesis introduced an innovative knowledge-based system − the VirtualGPS − that provides an integrated GPS knowledge platform based on a stable and efficient database structure with knowledge generation and accessing facilities. The system focuses on solving the intrinsic product design and production problems by acting as a virtual domain expert through translating GPS standards and rules into the forms of computerized expert advices and warnings. Furthermore, this system can be used as a training tool for young and new engineers to understand the huge amount of GPS standards in a relative “quicker” manner. The thesis started with a detailed discussion of the proposed categorical modelling mechanism, which has been devised based on the Category Theory. It provided a unified mechanism for knowledge acquisition and representation, knowledge-based system design, and database schema modelling. As a core part for assessing this knowledge-based system, the implementation of the categorical Database Management System (DBMS) is also presented in this thesis. The focus then moved on to demonstrate the design and implementation of the proposed VirtualGPS system. The tests and evaluations of this system were illustrated in Chapter 6. Finally, the thesis summarized the contributions to knowledge in Chapter 7. After thoroughly reviewing the project, the conclusions reached construe that the III entire VirtualGPS system was designed and implemented to conform to Category Theory and object-oriented programming rules. The initial tests and performance analyses show that the system facilitates the geometric product manufacturing operations and benefits the manufacturers and engineers alike from function designs, to a manufacturing and verification.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Proceedings of the 4th International Conference on Principles and Practices of Programming in Java

    Full text link
    This book contains the proceedings of the 4th international conference on principles and practices of programming in Java. The conference focuses on the different aspects of the Java programming language and its applications

    When to Utilize Software as a Service

    Get PDF
    Cloud computing enables on-demand network access to shared resources (e.g., computation, networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort. Cloud computing refers to both the applications delivered as services over the Internet and the hardware and system software in the data centers. Software as a service (SaaS) is part of cloud computing. It is one of the cloud service models. SaaS is software deployed as a hosted service and accessed over the Internet. In SaaS, the consumer uses the provider‘s applications running in the cloud. SaaS separates the possession and ownership of software from its use. The applications can be accessed from any device through a thin client interface. A typical SaaS application is used with a web browser based on monthly pricing. In this thesis, the characteristics of cloud computing and SaaS are presented. Also, a few implementation platforms for SaaS are discussed. Then, four different SaaS implementation cases and one transformation case are deliberated. The pros and cons of SaaS are studied. This is done based on literature references and analysis of the SaaS implementations and the transformation case. The analysis is done both from the customer‘s and service provider‘s point of view. In addition, the pros and cons of on-premises software are listed. The purpose of this thesis is to find when SaaS should be utilized and when it is better to choose a traditional on-premises software. The qualities of SaaS bring many benefits both for the customer as well as the provider. A customer should utilize SaaS when it provides cost savings, ease, and scalability over on-premises software. SaaS is reasonable when the customer does not need tailoring, but he only needs a simple, general-purpose service, and the application supports customer‘s core business. A provider should utilize SaaS when it offers cost savings, scalability, faster development, and wider customer base over on-premises software. It is wise to choose SaaS when the application is cheap, aimed at mass market, needs frequent updating, needs high performance computing, needs storing large amounts of data, or there is some other direct value from the cloud infrastructure.Siirretty Doriast

    An informatics based approach to respiratory healthcare.

    Get PDF
    By 2005 one person in every five UK households suffered with asthma. Research has shown that episodes of poor air quality can have a negative effect on respiratory health and is a growing concern for the asthmatic. To better inform clinical staff and patients to the contribution of poor air quality on patient health, this thesis defines an IT architecture that can be used by systems to identify environmental predictors leading to a decline in respiratory health of an individual patient. Personal environmental predictors of asthma exacerbation are identified by validating the delay between environmental predictors and decline in respiratory health. The concept is demonstrated using prototype software, and indicates that the analytical methods provide a mechanism to produce an early warning of impending asthma exacerbation due to poor air quality. The author has introduced the term enviromedics to describe this new field of research. Pattern recognition techniques are used to analyse patient-specific environments, and extract meaningful health predictors from the large quantities of data involved (often in the region of '/o million data points). This research proposes a suitable architecture that defines processes and techniques that enable the validation of patient-specific environmental predictors of respiratory decline. The design of the architecture was validated by implementing prototype applications that demonstrate, through hospital admissions data and personal lung function monitoring, that air quality can be used as a predictor of patient-specific health. The refined techniques developed during the research (such as Feature Detection Analysis) were also validated by the application prototypes. This thesis makes several contributions to knowledge, including: the process architecture; Feature Detection Analysis (FDA) that automates the detection of trend reversals within time series data; validation of the delay characteristic using a Self-organising Map (SOM) that is used as an unsupervised method of pattern recognition; Frequency, Boundary and Cluster Analysis (FBCA), an additional technique developed by this research to refine the SOM

    An informatics based approach to respiratory healthcare

    Get PDF
    By 2005 one person in every five UK households suffered with asthma. Research has shown that episodes of poor air quality can have a negative effect on respiratory health and is a growing concern for the asthmatic. To better inform clinical staff and patients to the contribution of poor air quality on patient health, this thesis defines an IT architecture that can be used by systems to identify environmental predictors leading to a decline in respiratory health of an individual patient. Personal environmental predictors of asthma exacerbation are identified by validating the delay between environmental predictors and decline in respiratory health. The concept is demonstrated using prototype software, and indicates that the analytical methods provide a mechanism to produce an early warning of impending asthma exacerbation due to poor air quality. The author has introduced the term enviromedics to describe this new field of research. Pattern recognition techniques are used to analyse patient-specific environments, and extract meaningful health predictors from the large quantities of data involved (often in the region of '/o million data points). This research proposes a suitable architecture that defines processes and techniques that enable the validation of patient-specific environmental predictors of respiratory decline. The design of the architecture was validated by implementing prototype applications that demonstrate, through hospital admissions data and personal lung function monitoring, that air quality can be used as a predictor of patient-specific health. The refined techniques developed during the research (such as Feature Detection Analysis) were also validated by the application prototypes. This thesis makes several contributions to knowledge, including: the process architecture; Feature Detection Analysis (FDA) that automates the detection of trend reversals within time series data; validation of the delay characteristic using a Self-organising Map (SOM) that is used as an unsupervised method of pattern recognition; Frequency, Boundary and Cluster Analysis (FBCA), an additional technique developed by this research to refine the SOM.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore