8 research outputs found
Kroak: A metadata collection system for long term microbial community monitoring
Amplytica is start-up company whose software, called the Amplytica Cloud Platform, helps organizations determine how microbes influence their bioprocesses. Examples of such bioprocesses include anaerobic digestion, wastewater treatment and mine site reclamation. The Amplytica Cloud Platform does this by integrating and analyzing metagenomically derived microbial community data (species composition, diversity, and abundance) and industrial bioprocess data (e.g. temperature, pH, nutrients). To achieve data integration, industrial bioprocess data is considered metadata to the microbial community information and describes the environmental conditions where the microbial community is found. The capture of this industrial metadata requires a robust metadata capture system. Kroak is a metadata capture system for the Amplytica Cloud Platform that facilitates tagging per-sample microbial community information with industrial environmental metadata. It uses a modern web interface for easy deployment, Office Open XML Workbook (XLSX) template files for easy metadata capture, and metadata classes to ensure data consistency and type identification for follow on automated statistics and machine learning. Kroak is a functional metadata capture system which will be iteratively improved upon by Amplytica. Potential improvements include changes to Kroak’s data model, increasing the reliability of its metadata parsing and the expansion of its existing web application programming interface
Kroak: A metadata collection system for long term microbial community monitoring
Amplytica is start-up company whose software, called the Amplytica Cloud Platform, helps organizations determine how microbes influence their bioprocesses. Examples of such bioprocesses include anaerobic digestion, wastewater treatment and mine site reclamation. The Amplytica Cloud Platform does this by integrating and analyzing metagenomically derived microbial community data (species composition, diversity, and abundance) and industrial bioprocess data (e.g. temperature, pH, nutrients). To achieve data integration, industrial bioprocess data is considered metadata to the microbial community information and describes the environmental conditions where the microbial community is found. The capture of this industrial metadata requires a robust metadata capture system. Kroak is a metadata capture system for the Amplytica Cloud Platform that facilitates tagging per-sample microbial community information with industrial environmental metadata. It uses a modern web interface for easy deployment, Office Open XML Workbook (XLSX) template files for easy metadata capture, and metadata classes to ensure data consistency and type identification for follow on automated statistics and machine learning. Kroak is a functional metadata capture system which will be iteratively improved upon by Amplytica. Potential improvements include changes to Kroak’s data model, increasing the reliability of its metadata parsing and the expansion of its existing web application programming interface
Recommended from our members
Process modelling for information system description
My previous experiences and some preliminary studies of the relevant technical literature allowed me to identify several reasons for which the current state of the database theory seemed unsatisfactory and required further research. These reasons included: insufficient formalism of data semantics, misinterpretation of NULL values, inconsistencies in the concept of the universal relation, certain ambiguities in domain definition, and inadequate representation of facts and constraints.
The commonly accepted ’sequentiality’ principle in most of the current system design methodologies imposes strong restrictions on the processes that a target system is composed of. They must be algorithmic and must not be interrupted during execution; neither may they have any parallel subprocesses as their own components. This principle can no longer be considered acceptable. In very many existing systems multiple processors perform many concurrent actions that can interact with each other.
The overconcentration on data models is another disadvantage of the majority of system design methods. Many techniques pay little (or no) attention to process definition. They assume that the model of the Real World consists only of data elements and relationships among them. However, the way the processes are related to each other (in terms of precedence relation) may have considerable impact on the data model.
It has been assumed that the Real World is discretisable, i.e. it may be modelled by a structure of objects. The word object is to be interpreted in a wide sense so it can mean anything within the boundaries of this part of the Real World that is to be represented in the target system. An object may then denote a fact or a physical or abstract entity, or relationships between any of these, or relationships between relationships, or even a still more complex structure.
The fundamental hypothesis was formulated stating the necessity of considering the three aspects of modelling - syntax, semantics and behaviour, and these to be considered integrally.
A syntactic representation of an object within a target system is called a construct A construct which cannot be decomposed further (either syntactically or semantically) is defined to be an atom. Any construct is a result of the following production rules: construct ::= atom I function construct; function ::= atom I construct. This syntax forms a sentential notation.
The sentential notation allows for extensive use of denotational semantics. The meaning of a construct may be defined as a function mapping from a set of syntactic constructs to the appropriate semantic domains; these in turn appear to be sets of functions since a construct may have a meaning in more than one class of objects. Because of its functional form the meaning of a construct may be derived from the meaning of its components.
The issue of system behaviour needed further investigation and a revision of the conventional model of computing. The sequentiality principle has been rejected, concurrency being regarded as a natural property of processes. A postulate has been formulated that any potential parallelism should be constructively used for data/process design and that the process structure would affect the data model. An important distinction has been made between a process declaration - considered as a form of data or an abstraction of knowledge - and a process application that corresponds to a physical action performed by a processor, according to a specific process declaration. In principle, a process may be applied to any construct - including its own representation - and it is a matter of semantics to state whether or not it is sensible to do so. The process application mechanism has been explained in terms of formal systems theory by introducing an abstract machine with two input and two output types of channels.
The system behaviour has been described by defining a process calculus. It is based on logical and functional properties of a discrete time model and provides a means to handle expressions composed of process-variables connected by logical functors. Basic terms of the calculus are: constructs and operations (equivalence, approximation, precedence, incidence, free-parallelism, strict-parallelism). Certain properties of these operations (e.g. associativity or transitivity) allow for handling large expressions. Rules for decomposing/integrating process applications, analogous in some sense to those forming the basis for structured programming, have been derived
XML documents schema design
The eXtensible Markup Language (XML) is fast emerging as the dominant standard for storing, describing and interchanging data among various systems and databases on the intemet. It offers schema such as Document Type Definition (DTD) or XML Schema Definition (XSD) for defining the syntax and structure of XML documents. To enable efficient usage of XML documents in any application in large scale electronic environment, it is necessary to avoid data redundancies and update anomalies. Redundancy and anomalies in XML documents can lead not only to higher data storage cost but also to increased costs for data transfer and data manipulation.To overcome this problem, this thesis proposes to establish a formal framework of XML document schema design. To achieve this aim, we propose a method to improve and simplify XML schema design by incorporating a conceptual model of the DTD with a theory of database normalization. A conceptual diagram, Graph-Document Type Definition (G-DTD) is proposed to describe the structure of XML documents at the schema level. For G- DTD itself, we define a structure which incorporates attributes, simple elements, complex elements, and relationship types among them. Furthermore, semantic constraints are also precisely defined in order to capture semantic meanings among the defined XML objects.In addition, to provide a guideline to a well-designed schema for XML documents, we propose a set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al. The corresponding normalization rules to transform from a G- DTD into a normal form schema are also discussed. A case study is given to illustrate the applicability of the concept. As a result, we found that the new normal forms are more concise and practical, in particular as they allow the user to find an 'optimal' structure of XML elements/attributes at the schema level. To prove that our approach is applicable for the database designer, we develop a prototype of XML document schema design using a Z formal specification language. Finally, using the same case study, this formal specification is tested to check for correctness and consistency of the specification. Thus, this gives a confidence that our prototype can be implemented successfully to generate an automatic XML schema design
A new formal and analytical process to product modeling (PPM) method and its application to the precast concrete industry
The current standard product (data) modeling process relies on the experience and subjectivity of data modelers who use their experience to eliminate redundancies and identify omissions. As a result, product modeling becomes a social activity that involves iterative review processes of committees. This study aims to develop a new, formal method for deriving product models from data collected in process models of companies within an industry sector. The theoretical goals of this study are to provide a scientific foundation to bridge the requirements collection phase and the logical modeling phase of product modeling and to formalize the derivation and normalization of a product model from the processes it supports. To achieve these goals, a new and formal method, Georgia Tech Process to Product Modeling (GTPPM), has been proposed. GTPPM consists of two modules. The first module is called the Requirements Collection and Modeling (RCM) module. It provides semantics and a mechanism to define a process model, information items used by each activity, and information flow between activities. The logic to dynamically check the consistency of information flow within a process also has been developed. The second module is called the Logical Product Modeling (LPM) module. It integrates, decomposes, and normalizes information constructs collected from a process model into a preliminary product model. Nine design patterns are defined to resolve conflicts between information constructs (ICs) and to normalize the resultant model. These two modules have been implemented as a Microsoft Visio â„¢ add-on. The tool has been registered and is also called GTPPM â„¢. The method has been tested and evaluated in the precast concrete sector of the construction industry through several GTPPM modeling efforts. By using GTPPM, a complete set of information items required for product modeling for a medium or a large industry can be collected without generalizing each company's unique process into one unified high-level model. However, the use of GTPPM is not limited to product modeling. It can be deployed in several other areas including: workflow management system or MIS (Management Information System) development software specification development business process re-engineering.Ph.D.Committee Chair: Eastman, Charles M.; Committee Co-Chair: Augenbroe, Godfried; Committee Co-Chair: Navathe, Shamkant B.; Committee Member: Hardwick, Martin; Committee Member: Sacks, Rafae
Canonical queries as a query answering device (Information Science)
Issued as Annual reports [nos. 1-2], and Final report, Project no. G-36-60
The semantic database model as a basis for an automated database design tool
Bibliography: p.257-80.The automatic database design system is a design aid for network database creation. It obtains a requirements specification from a user and generates a prototype database. This database is compatible with the Data Definition Language of DMS 1100, the database system on the Univac 1108 at the University of Cape Town. The user interface has been constructed in such a way that a computer-naive user can submit a description of his organisation to the system. Thus it constitutes a powerful database design tool, which should greatly alleviate the designer's tasks of communicating with users, and of creating an initial database definition. The requirements are formulated using the semantic database model, and semantic information in this model is incorporated into the database as integrity constraints. A relation scheme is also generated from the specification. As a result of this research, insight has been gained into the advantages and shortcomings of the semantic database model, and some principles for 'good' data models and database design methodologies have emerged