36 research outputs found

    BRIDGING THE GAP BETWEEN TECHNOLOGY AND SCIENCE WITH EXAMPLES FROM ECOLOGY AND BIODIVERSITY

    Get PDF
    Early informatics focused primarily on the application of technology and computer science to a specific domain; modern informatics has broadened to encompass human and knowledge dimensions. Application of technology is but one aspect of informatics. Understanding domain members’ issues, priorities, knowledge, abilities, interactions, tasks and work environments is another aspect, and one that directly impacts application success. Involving domain members in the design and development of technology in their domain is a key factor in bridging the gap between technology and science. This user-centered design (UCD) approach in informatics is presented via an ecoinformatics case study in three areas: collaboration, usability, and education and training

    Generating eScience Workflows from Statistical Analysis of Prior Data

    Get PDF
    A number of workflow design tools have been developed specifically to enable easy graphical specification of workflows that ensure systematic scientific data capture and analysis and precise provenance information. We believe that an important component that is missing from these existing workflow specification and enactment systems is integration with tools that enable prior detailed analysis of the existing data - and in particular statistical analysis. By thoroughly analyzing the existing relevant datasets first, it is possible to determine precisely where the existing data is sparse or insufficient and what further experimentation is required. Introducing statistical analysis to experimental design will reduce duplication and costs associated with fruitless experimentation and maximize opportunities for scientific breakthroughs. In this paper we describe a workflow specification system that we have developed for a particular eScience application (fuel cell optimization). Experimental workflow instances are generated as a result of detailed statistical analysis and interactive exploration of the existing datasets. This is carried out through a graphical data exploration interface that integrates the widely-used open source statistical analysis software package, R, as a web service

    Advancing Geospatial Data Curation

    Get PDF
    Digital curation is a new term that encompasses ideas from established disciplines: it defines a set of activities to manage and improve the transfer of the increasing volume of data products from producers of digital scientific and academic data to consumers, both now and in the future. Research topics in this new area are in a formative stage, but a variety of work that can serve to advance the curation of digital geospatial data is reviewed and suggested. Active research regarding geospatial data sets investigates the problems of tracking and reporting the data quality and lineage (provenance) of derived data products in geographic information systems, and managing varied geoprocessing workflow. Improving the descriptive semantics of geospatial operations will assist some of these existing areas of research, in particular lineage retrieval for geoprocessing results. Emerging issues in geospatial curation include the long-term preservation of frequently updated streams of geospatial data, and establishing systematic annotation for spatial data collections

    Scientific Publication Packages: A Selective Approach to the Communication and Archival of Scientific Output

    Get PDF
    The use of digital technologies within research has led to a proliferation of data, many new forms of research output and new modes of presentation and analysis. Many scientific communities are struggling with the challenge of how to manage the terabytes of data and new forms of output, they are producing. They are also under increasing pressure from funding organizations to publish their raw data, in addition to their traditional publications, in open archives. In this paper I describe an approach that involves the selective encapsulation of raw data, derived products, algorithms, software and textual publications within "scientific publication packages". Such packages provide an ideal method for: encapsulating expert knowledge; for publishing and sharing scientific process and results; for teaching complex scientific concepts; and for the selective archival, curation and preservation of scientific data and output. They also provide a bridge between technological advances in the Digital Libraries and eScience domains. In particular, I describe the RDF-based architecture that we are adopting to enable scientists to construct, publish and manage "scientific publication packages" - compound digital objects that encapsulate and relate the raw data to its derived products, publications and the associated contextual, provenance and administrative metadata

    Scientific Publication Packages – A Selective Approach to the Communication and Archival of Scientific Output

    Get PDF
    The use of digital technologies within research has led to a proliferation of data, many new forms of research output and new modes of presentation and analysis. Many scientific communities are struggling with the challenge of how to manage the terabytes of data and new forms of output, they are producing. They are also under increasing pressure from funding organizations to publish their raw data, in addition to their traditional publications, in open archives. In this paper I describe an approach that involves the selective encapsulation of raw data, derived products, algorithms, software and textual publications within “scientific publication packages”. Such packages provide an ideal method for: encapsulating expert knowledge; for publishing and sharing scientific process and results; for teaching complex scientific concepts; and for the selective archival, curation and preservation of scientific data and output. They also provide a bridge between technological advances in the Digital Libraries and eScience domains. In particular, I describe the RDF-based architecture that we are adopting to enable scientists to construct, publish and manage “scientific publication packages” - compound digital objects that encapsulate and relate the raw data to its derived products, publications and the associated contextual, provenance and administrative metadata

    High-throughput bioinformatics with the Cyrille2 pipeline system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or <it>pipelines</it>. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible.</p> <p>Results</p> <p>We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (<it>GUI</it>) that enables a pipeline operator to manage the system; 2) the <it>Scheduler</it>, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the <it>Executor</it>, which searches for scheduled jobs and executes these on a compute cluster.</p> <p>Conclusion</p> <p>The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines.</p

    Polyflow: a Polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs

    Get PDF
    Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping, assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.Muitos experimentos científicos são modelados como workflows (fluxos de trabalho). Workflows produzem comumente um grande volume de dados. De forma a garantir a reprodutibilidade desses workflows, estes geralmente são orquestrados por Sistemas de Gerência de Workflows (SGWfs), garantindo que dados de proveniência sejam capturados. Dados de proveniência representam o histórico de derivação de um dado ao longo da execução do workflow. Assim, o histórico de derivação dos dados pode ser representado por meio de um grafo de proveniência. Este grafo possibilita aos cientistas analisarem e avaliarem resultados produzidos por um workflow. Todavia, cada SGWf tem seu formato proprietário de representação para dados de proveniência, e os armazenam em diferentes granularidades. Consequentemente, em cenários mais complexos em que um cientista precisa analisar de forma integrada grafos de proveniência gerados por múltiplos workflows, isso se torna desafiador. Primeiramente, para entender o campo de pesquisa, realizamos um Mapeamento Sistemático da Literatura, avaliando soluções existentes sob diferentes lentes. Com uma compreensão mais clara do atual estado da arte, propomos uma ferramenta chamada Polyflow, inspirada em conceitos de sistemas Polystore, possibilitando a integração de várias bases de dados heterogêneas por meio de uma interface de consulta única que utiliza o ProvONE como schema global. Polyflow permite que cientistas submetam consultas em múltiplos grafos de proveniência de maneira integrada. Polyflow foi avaliado em conjunto com especialistas usando dados de proveniência coletados de workflows reais que apoiam o estudo de geração de árvores filogenéticas. O resultado da avaliação mostrou a viabilidade do Polyflow para interoperar semanticamente dados de proveniência gerado por distintos SGWfs, tanto do ponto de vista de desempenho quanto de usabilidade

    Knowledge discOvery And daTa minINg inteGrated (KOATING) Moderators for collaborative projects

    Get PDF
    A major issue in any multidiscipline collaborative project is how to best share and simultaneously exploit different types of expertise, without duplicating efforts or inadvertently causing conflicts or loss of efficiency through misunderstanding of individual or shared goals. Moderators are knowledge based systems designed to support collaborative teams by raising awareness of potential problems or conflicts. However, the functioning of a Moderator is limited by the knowledge it has about the team members. Knowledge acquisition, learning and updating of knowledge are the major challenges for a Moderator's implementation. To address these challenges a Knowledge discOvery And daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update their knowledge about team members. This enables the reuse of discovered knowledge from operational databases within collaborative projects. The integration of knowledge discovery in database (KDD) techniques into the existing Knowledge Acquisition Module of a moderator enables hidden data dependencies and relationships to be utilised to facilitate the moderation process. The architecture for the Universal Knowledge Moderator (UKM) shows how Moderators can be extended to incorporate a learning element which enables them to provide better support for virtual enterprises. Unified Modelling Language diagrams were used to specify the ways to design and develop the proposed system. The functioning of a UKM is presented using an illustrative example

    Documenting models and workflows: the next challenge in the field of ecological data management

    Get PDF
    Los modelos ecológicos se han convertido en una pieza clave de esta ciencia. La generación de conocimiento se consigue en buena medida mediante procesos analíticos más o menos complejos aplicados sobre conjuntos de datos diversos. Pero buena parte del conocimiento necesario para diseñar e implementar esos modelos no está accesible a la comunidad científica. Proponemos la creación de herramientas informáticas para documentar, almacenar y ejecutar modelos ecológicos y flujos de trabajo. Estas herramientas (repositorios de modelos) están siendo desarrolladas por otras disciplinas como la biología molecular o las ciencias de la Tierra. Presentamos un repositorio de modelos (ModeleR) desarrollado en el contexto del Observatorio de seguimiento del cambio global de Sierra Nevada (Granada-Almería). Creemos que los repositorios de modelos fomentarán la cooperación entre científicos, mejorando la creación de conocimiento relevante que podría ser transferido a los tomadores de decisiones.Ecological models have become a key part of this scientific discipline. Most of the knowledge created by ecologists is obtained by applying analytical processes to primary data. But most of the information underlying how to create models or use analytic techniques already published in the scientific literature is not readily available to scientists. We are proposing the creation of computer tools that help to document, store and execute ecological models and scientific workflows. These tools (called model repositories) are being developed by other disciplines such as molecular biology and earth science. We are presenting a model repository (called ModeleR) that has been developed in the context of the Sierra Nevada Global Change Observatory (Granada-Almería. Spain). We believe that model repositories will foster cooperation among scientists, enhancing the creation of relevant knowledge that could be transferred to environmental managers.El desarrollo de ModeleR ha sido financiado por la Consejería de Medio Ambiente y Ordenación del Territorio de la Junta de Andalucía a través de la Red de Información Ambiental (REDIAM), gracias a un convenio llamado “Diseño y creación de un repositorio de modelos para la red de información ambiental de Andalucía”. A.J. Pérez-Luque agradece al MICINN por el contrato PTA 2011-6322-I

    Efficient Indexing Structure for Trajectories in Geographical Information Systems

    Get PDF
    Technologies dealing with location such as GPS are producing more and more data of moving objects. Spatio-temporal databases store information about the positions of individual objects over time. Real-world applications of spatio-temporal data include vehicle navigation, migration of people, tracking and monitoring air-based, sea or land-based vehicles. Also the location technologies, such as GPS and telegraphy, are producing more and more data of moving objects. Spatio-temporal database is needed to manage these data, so as to solve the problems in spatio-temporal applications. A spatio-temporal database adopts an exhaustive search strategy for querying the trajectories. This is very time-consuming when processing large datasets for the given spatio-temporal query conditions. As a result, efficient Spatio-Temporal indexing methods are highly demanded to improve the performance of the system in searching such large datasets.Computer Science Departmen
    corecore