Search CORE

2,765 research outputs found

Integrating spatial data with computing infrastructure and field application

Author: Pan Jiangyi
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2004
Field of study

A significant portion of government agencies\u27 activities require mobile data collection and analysis in the field. To accommodate such tasks, various data sources are involved to provide the supporting information. The data integration, especially spatial data integration, in these applications turns out to be a big issue due to the multitude and heterogeneity of possible data sources. An aim of our research was to develop a flexible and extensible infrastructure to facilitate the field applications with integrated data sources. New designs have been proposed to treat heterogeneous data sources as a set of object views. This object view approach helps to integrate multiple data sources into an existent object-oriented view system. This research also goes a step further from the previous version of infrastructure in terms of extending data processing and communication capabilities. The oracle spatial database has been included as a new type of spatial data source. The CORBA-based client-server model is also implemented as another communication resort for the infrastructure to interact with the data access component

Digital Repository @ Iowa State University (ISU)

A cooperative framework for molecular biology database integration using image object selection

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

A cooperative framework for molecular biology database integration using image object selection.

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

An infrastructure for delivering geospatial data to field users

Author: Qu Sheng
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2003
Field of study

Federal agencies collect and analyze data to carry out their missions. A significant portion of these activities requires geospatial data collection in the field. Models for computer-assisted survey information collection are still largely based on the client-server paradigm with symbolic data representation. Little attention has been given to digital geospatial information resources, or emerging mobile computing environments. This paper discusses an infrastructure designs for delivering geospatial data users in a mobile field computing environment. Mobile field computing environments vary widely, and generally offer extremely limited computing resources, visual display, and bandwidth relative to the usual resources required for distributed geospatial data. Key to handling heterogeneity in the field is an infrastructure design that provides flexibility in the location of computing tasks and returns information in forms appropriate for the field computing environment. A view agent based infrastructure has been developed with several components. Wrappers are used for encapsulating not only the data sources, but the mobile field environment as well, localizing the details associated with heterogeneity in data sources and field environments. Within the boundaries of the wrappers, mediators and object-oriented views implemented as mobile agents work in a relatively homogeneous environment to generate query results. Mediators receive a request from the user application via the field wrapper, and generate a sequence of mobile view agents to search for, retrieve, and process data. The internal infrastructure environment is populated with computation servers to provide a location for processing, especially for combining data from multiple locations. Each computation server has a local object-oriented data warehouse equipped with a set of data warehouse tools for working with geospatial data. Since the prospect of query reuse is likely for a field worker, we store the final and intermediate results in the data warehouse, allowing the warehouse to act as an active cache. Even when field computing capacity is ample, the warehouse is used to process data so that network traffic can be minimized

Digital Repository @ Iowa State University (ISU)

Python as a Federation Tool for GENESIS 3.0

Author: A Davison
A Dorval
A Gortechnikov
A Martelli
Allan D. Coop
Armando L. Rodriguez
C Günay
D Goodman
D Pecevski
E De Schutter
E De Schutter
E Nordlie
ES Raymond
F Brooks
G Ascoli
H Cornelis
H Cornelis
H Cornelis
H Cornelis
H Cornelis
Hugo Cornelis
J Bettencourt
J Fiala
James M. Bower
JG King
JK Ousterhout
JK Ousterhout
JM Eppler
K Blackwell
Kelvin E. Jones
L Huo
L Wall
M Diesmann
M Djurfeldt
M Hines
ML Hines
NH Goddard
P Gleeson
P Gleeson
R O'Hara
R Subhasis
S Crook
S Wils
U Bhalla
Publication venue: Public Library of Science
Publication date: 20/01/2012
Field of study

The GENESIS simulation platform was one of the first broad-scale modeling systems in computational biology to encourage modelers to develop and share model features and components. Supported by a large developer community, it participated in innovative simulator technologies such as benchmarking, parallelization, and declarative model specification and was the first neural simulator to define bindings for the Python scripting language. An important feature of the latest version of GENESIS is that it decomposes into self-contained software components complying with the Computational Biology Initiative federated software architecture. This architecture allows separate scripting bindings to be defined for different necessary components of the simulator, e.g., the mathematical solvers and graphical user interface. Python is a scripting language that provides rich sets of freely available open source libraries. With clean dynamic object-oriented designs, they produce highly readable code and are widely employed in specialized areas of software component integration. We employ a simplified wrapper and interface generator to examine an application programming interface and make it available to a given scripting language. This allows independent software components to be ‘glued’ together and connected to external libraries and applications from user-defined Python or Perl scripts. We illustrate our approach with three examples of Python scripting. (1) Generate and run a simple single-compartment model neuron connected to a stand-alone mathematical solver. (2) Interface a mathematical solver with GENESIS 3.0 to explore a neuron morphology from either an interactive command-line or graphical user interface. (3) Apply scripting bindings to connect the GENESIS 3.0 simulator to external graphical libraries and an open source three dimensional content creation suite that supports visualization of models based on electron microscopy and their conversion to computational models. Employed in this way, the stand-alone software components of the GENESIS 3.0 simulator provide a framework for progressive federated software development in computational neuroscience

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

Approach and Preliminary Results for Early Growth Technology Analysis

Author: Camina Steven
Firat Ayse Kaya
Fogg Erik
Li Clare
Madnick Stuart
Woon Wei Lee
Ziegler Blaine
Publication venue: Cambridge, MA; Alfred P. Sloan School of Management, Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Even experts cannot be fully aware of all the promising developments in broad and complex fields of technology, such as renewable energy. Fortunately, there exist many diverse sources of information that report new technological developments, such as journal publications, news stories, and blogs. However, the volume of data contained in these sources is enormous; it would be difficult for a human to read and digest all of this information - especially in a timely manner. This paper describes a novel application of technology mining techniques to these diverse information sources to study, visualize, and identify the evolution of promising new technologies - a challenge we call 'early growth technology analysis.' For the work reported herein, we use as inputs information about millions of published documents contained in sources such as SCIRCUS, Inspec, and Compendex. We accomplish this analysis through the use of bibliometric analysis, consisting of three key steps: 1. Extract related keywords (from keywords in articles) 2. Determine the annual occurrence frequencies of these keywords 3. Identify those exhibiting rapid growth, particularly if starting from a low base. To provide a focus for the experiments and subsequent discussions, a pilot study was conducted in the area of 'renewable energy,' though the techniques and methods developed are neutral to the domain of study. Preliminary results and conclusions from the case study are presented and are discussed in the context of the effectiveness of the proposed methodology

DSpace@MIT

Crossref

Sharing scientific experiments and workflows in environmental applications

Author: Campos Maria Luiza
Cavalcanti Maria Cláudia
Llirbat François
Mattoso Marta
Simon Eric
Publication venue: Instituto Tércio Pacitti de Aplicações e Pesquisas Computacionais
Publication date: 30/12/2000
Field of study

Environmental applications have been stimulating the cooperation among scientists from different disciplines. There are many examples where this cooperation takes place through exchanging scientific resources, such as data, programs and mathematical models. The LeSelect architecture supports environmental applications, where scientists may share their data and programs. We believe that besides programs and data, models, as well as experiments and workflows are scientific resources that need to be shared in environmental applications. Therefore, in this paper we propose an extension to LeSelect architecture that allows sharing of models, experiments and workflows

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon