220 research outputs found

    A REVIEW: CONCEPTUAL DATA MODELS FOR BIOLOGICAL DOMAIN

    Get PDF
    ABSTRACT This paper demonstrates the survey and review of conceptual data models and the novel data modeling techniques of biological data. The term conceptual data modeling is used in broad categories in this sense. The biological data, its concepts and frameworks have diversity of expressiveness under the umbrella of bioinformatics. If we consider the biological data a single field of research, it is not possible to handle all these things efficiently and completely. For provision of highly maintainable and efficient solutions, which will have less cost and complexity, we must reduce its scope by making its sub domains in bioinformatics. Keep in mind the aforementioned reasons, we considered only the concept of central dogma of molecular biology; produces sequence biological data (DNA, RNA and protein structures); to describe this reviewed study of conceptual modeling. Our objectives are to provide a current state of art study of conceptual data models for a sequence biological data. Based on this research, we will propose a uniform data model for biological data for unification purposes. In this review paper, we provide the analysis and post-mortems of existing conceptual biological data models, and present their comparison, provided on the basis of conceptually proposed methodologies, Meta data, modeling methods and other critical aspects, necessary for sequence data. This study provides us the cutting edge for the integration of biological data

    -ilities Tradespace and Affordability Project – Phase 3

    Get PDF
    One of the key elements of the SERC’s research strategy is transforming the practice of systems engineering and associated management practices – “SE and Management Transformation (SEMT).” The Grand Challenge goal for SEMT is to transform the DoD community’s current systems engineering and management methods, processes, and tools (MPTs) and practices away from sequential, single stovepipe system, hardware-first, document-driven, point- solution, acquisition-oriented approaches; and toward concurrent, portfolio and enterprise- oriented, hardware-software-human engineered, model-driven, set-based, full life cycle approaches.This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Office of the Assistant Secretary of Defense for Research and Engineering (ASD(R&E)) under Contract H98230-08- D-0171 (Task Order 0031, RT 046).This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Office of the Assistant Secretary of Defense for Research and Engineering (ASD(R&E)) under Contract H98230-08- D-0171 (Task Order 0031, RT 046)

    Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

    Get PDF
    With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

    On-premise containerized, light-weight software solutions for Biomedicine

    Get PDF
    Bioinformatics software systems are critical tools for analysing large-scale biological data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several software approaches on the design and implementation of bioinformatics software systems. These approaches include software patterns, microservices, distributed computing, containerisation and container orchestration. The research focuses on understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights the challenges and considerations involved in their implementation. This study also examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to enhance the productivity and performance of bioinformatics software systems. The research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can significantly improve the code accessibility and structure of bioinformatics software systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting advanced software engineering practices, such as model-driven design and container orchestration, can facilitate efficient and productive deployment and management of bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system demonstrated the ability to address challenges in bioinformatics. The thesis makes several key contributions in addressing the research questions surrounding the design, implementation, and optimisation of bioinformatics software systems using software patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge für die Analyse umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann jedoch aufgrund der erforderlichen Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich darauf, zu verstehen, wie sich diese Techniken auf die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit und Effizienz von bioinformatischen Software-Systemen auswirken und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die Einschränkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die Schlüsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die Produktivität und Leistung bioinformatischer Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgeführt. Die erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die Zuverlässigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit des Systems bei. Darüber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung, die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen für Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich der Bioinformatik zu bewältigen und stellt somit ein wertvolles Werkzeug für Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige Beiträge zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der Implementierung und der Optimierung von Software-Systemen für die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit, Effizienz und Produktivität bioinformatischer Software-Systeme erheblich verbessern kann

    A digital repository with an extensible data model for biobanking and genomic analysis management

    Get PDF
    Motivation: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.Peer reviewe

    Formalization of molecular interaction maps in systems biology; Application to simulations of the relationship between DNA damage response and circadian rhythms

    Full text link
    Quantitative exploration of biological pathway networks must begin with a qualitative understanding of them. Often researchers aggregate and disseminate experimental data using regulatory diagrams with ad hoc notations leading to ambiguous interpretations of presented results. This thesis has two main aims. First, it develops software to allow researchers to aggregate pathway data diagrammatically using the Molecular Interaction Map (MIM) notation in order to gain a better qualitative understanding of biological systems. Secondly, it develops a quantitative biological model to study the effect of DNA damage on circadian rhythms. The second aim benefits from the first by making use of visual representations to identify potential system boundaries for the quantitative model. I focus first on software for the MIM notation - a notation to concisely visualize bioregulatory complexity and to reduce ambiguity for readers. The thesis provides a formalized MIM specification for software implementation along with a base layer of software components for the inclusion of the MIM notation in other software packages. It also provides an implementation of the specification as a user-friendly tool, PathVisio-MIM, for creating and editing MIM diagrams along with software to validate and overlay external data onto the diagrams. I focus secondly on the application of the MIM software to the quantitative exploration of the poorly understood role of SIRT1 and PARP1, two NAD+-dependent enzymes, in the regulation of circadian rhythms during DNA damage response. SIRT1 and PARP1 participate in the regulation of several key DNA damage-repair proteins and are the subjects of study as potential cancer therapeutic targets. In this part of the thesis, I present an ordinary differential equation (ODE) model that simulates the core circadian clock and the involvement of SIRT1 in both the positive and negative arms of circadian regulation. I then use this model is then used to predict a potential role for the competition for NAD+ supplies by SIRT1 and PARP1 leading to the observed behavior of primarily phase advancement of circadian oscillations during DNA damage response. The model further predicts a potential mechanism by which multiple forms of post-transcriptional modification may cooperate to produce a primarily phase advancement

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Aerospace Medicine and Biology: a Continuing Bibliography with Indexes (supplement 330)

    Get PDF
    This bibliography lists 156 reports, articles, and other documents introduced into the NASA Scientific and Technical Information System during November 1989. Subject coverage includes: aerospace medicine and psychology, life support system and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance

    Sharing and viewing segments of electronic patient records service (SVSEPRS) using multidimensional database model

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The concentration on healthcare information technology has never been determined than it is today. This awareness arises from the efforts to accomplish the extreme utilization of Electronic Health Record (EHR). Due to the greater mobility of the population, EHR will be constructed and continuously updated from the contribution of one or many EPRs that are created and stored at different healthcare locations such as acute Hospitals, community services, Mental Health and Social Services. The challenge is to provide healthcare professionals, remotely among heterogeneous interoperable systems, with a complete view of the selective relevant and vital EPRs fragments of each patient during their care. Obtaining extensive EPRs at the point of delivery, together with ability to search for and view vital, valuable, accurate and relevant EPRs fragments can be still challenging. It is needed to reduce redundancy, enhance the quality of medical decision making, decrease the time needed to navigate through very high number of EPRs, which consequently promote the workflow and ease the extra work needed by clinicians. These demands was evaluated through introducing a system model named SVSEPRS (Searching and Viewing Segments of Electronic Patient Records Service) to enable healthcare providers supply high quality and more efficient services, redundant clinical diagnostic tests. Also inappropriate medical decision making process should be avoided via allowing all patients‟ previous clinical tests and healthcare information to be shared between various healthcare organizations. Multidimensional data model, which lie at the core of On-Line Analytical Processing (OLAP) systems can handle the duplication of healthcare services. This is done by allowing quick search and access to vital and relevant fragments from scattered EPRs to view more comprehensive picture and promote advances in the diagnosis and treatment of illnesses. SVSEPRS is a web based system model that helps participant to search for and view virtual EPR segments, using an endowed and well structured Centralised Multidimensional Search Mapping (CMDSM). This defines different quantitative values (measures), and descriptive categories (dimensions) allows clinicians to slice and dice or drill down to more detailed levels or roll up to higher levels to meet clinicians required fragment
    corecore