339 research outputs found
Application of Software Engineering Principles to Synthetic Biology and Emerging Regulatory Concerns
As the science of synthetic biology matures, engineers have begun to deliver real-world applications which are the beginning of what could radically transform our lives. Recent progress indicates synthetic biology will produce transformative breakthroughs. Examples include: 1) synthesizing chemicals for medicines which are expensive and difficult to produce; 2) producing protein alternatives; 3) altering genomes to combat deadly diseases; 4) killing antibiotic-resistant pathogens; and 5) speeding up vaccine production.
Although synthetic biology promises great benefits, many stakeholders have expressed concerns over safety and security risks from creating biological behavior never seen before in nature. As with any emerging technology, there is the risk of malicious use known as the dual-use problem. The technology is becoming democratized and de-skilled, and people in do-it-yourself communities can tinker with genetic code, similar to how programming has become prevalent through the ease of using macros in spreadsheets. While easy to program, it may be non-trivial to validate novel biological behavior. Nevertheless, we must be able to certify synthetically engineered organisms behave as expected, and be confident they will not harm natural life or the environment.
Synthetic biology is an interdisciplinary engineering domain, and interdisciplinary problems require interdisciplinary solutions. Using an interdisciplinary approach, this dissertation lays foundations for verifying, validating, and certifying safety and security of synthetic biology applications through traditional software engineering concepts about safety, security, and reliability of systems. These techniques can help stakeholders navigate what is currently a confusing regulatory process.
The contributions of this dissertation are: 1) creation of domain-specific patterns to help synthetic biologists develop assurance cases using evidence and arguments to validate safety and security of designs; 2) application of software product lines and feature models to the modular DNA parts of synthetic biology commonly known as BioBricks, making it easier to find safety features during design; 3) a technique for analyzing DNA sequence motifs to help characterize proteins as toxins or non-toxins; 4) a legal investigation regarding what makes regulating synthetic biology challenging; and 5) a repeatable workflow for leveraging safety and security artifacts to develop assurance cases for synthetic biology systems.
Advisers: Myra B. Cohen and Brittany A. Dunca
Notebook-as-a-VRE (NaaVRE): From private notebooks to a collaborative cloud virtual research environment
Virtual Research Environments (VREs) provide user-centric support in the
lifecycle of research activities, e.g., discovering and accessing research
assets, or composing and executing application workflows. A typical VRE is
often implemented as an integrated environment, which includes a catalog of
research assets, a workflow management system, a data management framework, and
tools for enabling collaboration among users. Notebook environments, such as
Jupyter, allow researchers to rapidly prototype scientific code and share their
experiments as online accessible notebooks. Jupyter can support several popular
languages that are used by data scientists, such as Python, R, and Julia.
However, such notebook environments do not have seamless support for running
heavy computations on remote infrastructure or finding and accessing software
code inside notebooks. This paper investigates the gap between a notebook
environment and a VRE and proposes an embedded VRE solution for the Jupyter
environment called Notebook-as-a-VRE (NaaVRE). The NaaVRE solution provides
functional components via a component marketplace and allows users to create a
customized VRE on top of the Jupyter environment. From the VRE, a user can
search research assets (data, software, and algorithms), compose workflows,
manage the lifecycle of an experiment, and share the results among users in the
community. We demonstrate how such a solution can enhance a legacy workflow
that uses Light Detection and Ranging (LiDAR) data from country-wide airborne
laser scanning surveys for deriving geospatial data products of ecosystem
structure at high resolution over broad spatial extents. This enables users to
scale out the processing of multi-terabyte LiDAR point clouds for ecological
applications to more data sources in a distributed cloud environment.Comment: A revised version has been published in the journal software practice
and experienc
A Domain-Specific Language and Editor for Parallel Particle Methods
Domain-specific languages (DSLs) are of increasing importance in scientific
high-performance computing to reduce development costs, raise the level of
abstraction and, thus, ease scientific programming. However, designing and
implementing DSLs is not an easy task, as it requires knowledge of the
application domain and experience in language engineering and compilers.
Consequently, many DSLs follow a weak approach using macros or text generators,
which lack many of the features that make a DSL a comfortable for programmers.
Some of these features---e.g., syntax highlighting, type inference, error
reporting, and code completion---are easily provided by language workbenches,
which combine language engineering techniques and tools in a common ecosystem.
In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL
and development environment for numerical simulations based on particle methods
and hybrid particle-mesh methods. PPME uses the meta programming system (MPS),
a projectional language workbench. PPME is the successor of the Parallel
Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional
implementation strategies. We analyze and compare both languages and
demonstrate how the programmer's experience can be improved using static
analyses and projectional editing. Furthermore, we present an explicit domain
model for particle abstractions and the first formal type system for particle
methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25,
201
Studies on distributed approaches for large scale multi-criteria protein structure comparison and analysis
Protein Structure Comparison (PSC) is at the core of many important structural biology problems. PSC is used to infer the evolutionary history of distantly related proteins; it can also help in the identification of the biological function of a new protein by comparing it with other proteins whose function has already been annotated; PSC is also a key step in protein structure prediction, because one needs to reliably and efficiently compare tens or hundreds of thousands of decoys (predicted structures) in evaluation of 'native-like' candidates (e.g. Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment). Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity, which naturally lead to the Multi-Criteria Protein Structure Comparison (MC-PSC) problem. ProCKSI (www.procksi.org), was the first publicly available server to provide algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods (e.g. USM, FAST, MaxCMO, DaliLite, CE and TMAlign). Current MC-PSC works well for moderately sized data sets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real-time, a capacity beyond our current technology.
This research is aimed at the investigation of Grid-styled distributed computing strategies for the solution of the enormous computational challenge inherent in MC-PSC. To this aim a novel distributed algorithm has been designed, implemented and evaluated with different load balancing strategies and selection and configuration of a variety of software tools, services and technologies on different levels of infrastructures ranging from local testbeds to production level eScience infrastructures such as the National Grid Service (NGS). Empirical results of different experiments reporting on the scalability, speedup and efficiency of the overall system are presented and discussed along with the software engineering aspects behind the implementation of a distributed solution to the MC-PSC problem based on a local computer cluster as well as with a GRID implementation. The results lead us to conclude that the combination of better and faster parallel and distributed algorithms with more similarity comparison methods provides an unprecedented advance on protein structure comparison and analysis technology. These advances might facilitate both directed and fortuitous discovery of protein similarities, families, super-families, domains, etc, and also help pave the way to faster and better protein function inference, annotation and protein structure prediction and assessment thus empowering the structural biologist to do a science that he/she would not have done otherwise
Distributed Web Service Coordination for Collaboration Applications and Biological Workflows
In this dissertation work, we have investigated the main research thrust of decentralized coordination of workflows over web services. To address distributed workflow coordination, first we have developed “Web Coordination Bonds” as a capable set of dependency modeling primitives that enable each web service to manage its own dependencies. Web bond primitives are as powerful as extended Petri nets and have sufficient modeling and expressive capabilities to model workflow dependencies. We have designed and prototyped our “Web Service Coordination Management Middleware” (WSCMM) system that enhances current web services infrastructure to accommodate web bond enabled web services. Finally, based on core concepts of web coordination bonds and WSCMM, we have developed the “BondFlow” system that allows easy configuration distributed coordination of workflows. The footprint of the BonFlow runtime is 24KB and the additional third party software packages, SOAP client and XML parser, account for 115KB
- …