600 research outputs found

    InfraPhenoGrid: A scientific workflow infrastructure for Plant Phenomics on the Grid

    Get PDF
    International audiencePlant phenotyping consists in the observation of physical and biochemical traits of plant genotypes in response to environmental conditions. Challenges , in particular in context of climate change and food security, are numerous. High-throughput platforms have been introduced to observe the dynamic growth of a large number of plants in different environmental conditions. Instead of considering a few genotypes at a time (as it is the case when phenomic traits are measured manually), such platforms make it possible to use completely new kinds of approaches. However, the data sets produced by such widely instrumented platforms are huge, constantly augmenting and produced by increasingly complex experiments, reaching a point where distributed computation is mandatory to extract knowledge from data. In this paper, we introduce InfraPhenoGrid, the infrastructure we designed and deploy to efficiently manage data sets produced by the PhenoArch plant phenomics platform in the context of the French Phenome Project. Our solution consists in deploying scientific workflows on a Grid using a middle-ware to pilot workflow executions. Our approach is user-friendly in the sense that despite the intrinsic complexity of the infrastructure, running scientific workflows and understanding results obtained (using provenance information) is kept as simple as possible for end-users

    Towards Collaborative Scientific Workflow Management System

    Get PDF
    The big data explosion phenomenon has impacted several domains, starting from research areas to divergent of business models in recent years. As this intensive amount of data opens up the possibilities of several interesting knowledge discoveries, over the past few years divergent of research domains have undergone the shift of trend towards analyzing those massive amount data. Scientific Workflow Management System (SWfMS) has gained much popularity in recent years in accelerating those data-intensive analyses, visualization, and discoveries of important information. Data-intensive tasks are often significantly time-consuming and complex in nature and hence SWfMSs are designed to efficiently support the specification, modification, execution, failure handling, and monitoring of the tasks in a scientific workflow. As far as the complexity, dimension, and volume of data are concerned, their effective analysis or management often become challenging for an individual and requires collaboration of multiple scientists instead. Hence, the notion of 'Collaborative SWfMS' was coined - which gained significant interest among researchers in recent years as none of the existing SWfMSs directly support real-time collaboration among scientists. In terms of collaborative SWfMSs, consistency management in the face of conflicting concurrent operations of the collaborators is a major challenge for its highly interconnected document structure among the computational modules - where any minor change in a part of the workflow can highly impact the other part of the collaborative workflow for the datalink relation among them. In addition to the consistency management, studies show several other challenges that need to be addressed towards a successful design of collaborative SWfMSs, such as sub-workflow composition and execution by different sub-groups, relationship between scientific workflows and collaboration models, sub-workflow monitoring, seamless integration and access control of the workflow components among collaborators and so on. In this thesis, we propose a locking scheme to facilitate consistency management in collaborative SWfMSs. The proposed method works by locking workflow components at a granular attribute level in addition to supporting locks on a targeted part of the collaborative workflow. We conducted several experiments to analyze the performance of the proposed method in comparison to related existing methods. Our studies show that the proposed method can reduce the average waiting time of a collaborator by up to 36% while increasing the average workflow update rate by up to 15% in comparison to existing descendent modular level locking techniques for collaborative SWfMSs. We also propose a role-based access control technique for the management of collaborative SWfMSs. We leverage the Collaborative Interactive Application Methodology (CIAM) for the investigation of role-based access control in the context of collaborative SWfMSs. We present our proposed method with a use-case of Plant Phenotyping and Genotyping research domain. Recent study shows that the collaborative SWfMSs often different sets of opportunities and challenges. From our investigations on existing research works towards collaborative SWfMSs and findings of our prior two studies, we propose an architecture of collaborative SWfMSs. We propose - SciWorCS - a Collaborative Scientific Workflow Management System as a proof of concept of the proposed architecture; which is the first of its kind to the best of our knowledge. We present several real-world use-cases of scientific workflows using SciWorCS. Finally, we conduct several user studies using SciWorCS comprising different real-world scientific workflows (i.e., from myExperiment) to understand the user behavior and styles of work in the context of collaborative SWfMSs. In addition to evaluating SciWorCS, the user studies reveal several interesting facts which can significantly contribute in the research domain, as none of the existing methods considered such empirical studies, and rather relied only on computer generated simulated studies for evaluation

    Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

    Get PDF
    With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

    Comprehensive data infrastructure for plant bioinformatics

    Get PDF
    The iPlant Collaborative is a 5-year, National Science Foundation-funded effort to develop cyberinfrastructure to address a series of grand challenges in plant science. The second of these grand challenges is the Genotype-to- Phenotype project, which seeks to provide tools, in the form of a web-based Discovery Environment, for understanding the developmental process from DNA to a full-grown plant. Addressing this challenge requires the integration of multiple data types that may be stored in multiple formats, with varying levels of standardization. Providing for reproducibility requires that detailed information documenting the experimental provenance of data, and the computational transformations applied to data once it is brought into the iPlant environment. Handling the large quantities of data involved in high-throughput sequencing and other experimental sources of bioinformatics data requires a robust infrastructure for storing and reusing large data objects. We describe the currently planned workflows to be developed for the Genotype-to-Phenotype discovery environment, the data types and formats that must be imported and manipulated within the environment, and we describe the data model that has been developed to express and exchange data within the Discovery Environment, along with the provenance model defined for capturing experimental source and digital transformation descriptions. Capabilities for interaction with reference databases are addressed, focusing not just on the ability to retrieve data from such data sources, but on the ability to use the iPlant Discovery Environment to further populate these important resources. Future activities and the challenges they will present to the data infrastructure of the iPlant Collaborative are also described. © 2010 IEEE

    An Intermediate Data-driven Methodology for Scientific Workflow Management System to Support Reusability

    Get PDF
    Automatic processing of different logical sub-tasks by a set of rules is a workflow. A workflow management system (WfMS) is a system that helps us accomplish a complex scientific task through making a sequential arrangement of sub-tasks available as tools. Workflows are formed with modules from various domains in a WfMS, and many collaborators of the domains are involved in the workflow design process. Workflow Management Systems (WfMSs) have been gained popularity in recent years for managing various tools in a system and ensuring dependencies while building a sequence of executions for scientific analyses. As a result of heterogeneous tools involvement and collaboration requirement, Collaborative Scientific Workflow Management Systems (CSWfMS) have gained significant interest in the scientific analysis community. In such systems, big data explosion issues exist with massive velocity and variety characteristics for the heterogeneous large amount of data from different domains. Therefore a large amount of heterogeneous data need to be managed in a Scientific Workflow Management System (SWfMS) with a proper decision mechanism. Although a number of studies addressed the cost management of data, none of the existing studies are related to real- time decision mechanism or reusability mechanism. Besides, frequent execution of workflows in a SWfMS generates a massive amount of data and characteristics of such data are always incremental. Input data or module outcomes of a workflow in a SWfMS are usually large in size. Processing of such data-intensive workflows is usually time-consuming where modules are computationally expensive for their respective inputs. Besides, lack of data reusability, limitation of error recovery, inefficient workflow processing, inefficient storing of derived data, lacking in metadata association and lacking in validation of the effectiveness of a technique of existing systems need to be addressed in a SWfMS for efficient workflow building by maintaining the big data explosion. To address the issues, in this thesis first we propose an intermediate data management scheme for a SWfMS. In our second attempt, we explored the possibilities and introduced an automatic recommendation technique for a SWfMS from real-world workflow data (i.e Galaxy [1] workflows) where our investigations show that the proposed technique can facilitate 51% of workflow building in a SWfMS by reusing intermediate data of previous workflows and can reduce 74% execution time of workflow buildings in a SWfMS. Later we propose an adaptive version of our technique by considering the states of tools in a SWfMS, which shows around 40% reusability for workflows. Consequently, in our fourth study, We have done several experiments for analyzing the performance and exploring the effectiveness of the technique in a SWfMS for various environments. The technique is introduced to emphasize on storing cost reduction, increase data reusability, and faster workflow execution, to the best of our knowledge, which is the first of its kind. Detail architecture and evaluation of the technique are presented in this thesis. We believe our findings and developed system will contribute significantly to the research domain of SWfMSs

    The iPlant Collaborative: Cyberinfrastructure for Plant Biology

    Get PDF
    The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services

    Development and Evaluation of Unmanned Aerial Vehicles for High Throughput Phenotyping of Field-based Wheat Trials.

    Get PDF
    Growing demands for increased global yields are driving researchers to develop improved crops, capable of securing higher yields in the face of significant challenges including climate change and competition for resources. However, abilities to measure favourable physical characteristics (phenotypes) of key crops in response to these challenges is limited. For crop breeders and researchers, current abilities to phenotype field-based experiments with sufficient precision, resolution and throughput is restricting any meaningful advances in crop development. This PhD thesis presents work focused on the development and evaluation of Unmanned Aerial Vehicles (UAVs) in combination with remote sensing technologies as a solution for improved phenotyping of field-based crop experiments. Chapter 2 presents first, a review of specific target phenotypic traits within the categories of crop morphology and spectral reflectance, together with critical review of current standard measurement protocols. After reviewing phenotypic traits, focus turns to UAVs and UAV specific technologies suitable for the application of crop phenotyping, including critical evaluation of both the strengths and current limitations associated with UAV methods and technologies, highlighting specific areas for improvement. Chapter 3 presents a published paper successfully developing and evaluating Structure from Motion photogrammetry for accurate (R2 ≄ 0.93, RMSE ≀ 0.077m, and Bias ≀ -0.064m) and temporally consistent 3D reconstructions of wheat plot heights. The superior throughput achieved further facilitated measures of crop growth rate through the season; whilst very high spatial resolutions highlighted both the inter- and intra-plot variability in crop heights, something unachievable with the traditional manual ruler methods. Chapter 4 presents published work developing and evaluating modified Commercial ‘Off the Shelf’ (COTS) cameras for obtaining radiometrically calibrated imagery of canopy spectral reflectance. Specifically, development focussed on improving application of these cameras under variable illumination conditions, via application of camera exposure, vignetting, and irradiance corrections. Validation of UAV derived Normalised Difference Vegetation Index (NDVI) against a ground spectrometer from the COTS cameras (0.94 ≀ R2 ≄ 0.88) indicated successful calibration and correction of the cameras. The higher spatial resolution obtained from the COTS cameras, facilitated the assessment of the impact of background soil reflectance on derived mean Normalised Difference Vegetation Index (NDVI) measures of experimental plots, highlighting the impact of incomplete canopy on derived indices. Chapter 5 utilises the developed methods and cameras from Chapter 4 to assess the impact of nitrogen fertiliser application on the formation and senescence dynamics of canopy traits over multiple growing seasons. Quantification of changes in canopy reflectance, via NDVI, through three select trends in the wheat growth cycle were used to assess any impact of nitrogen on these periods of growth. Results showed consistent impact of zero nitrogen application on crop canopies within all three development phases. Additional results found statistically significant positive correlations between quantified phases and harvest metrics (e.g. final yield), with greatest correlations occurring within the second (Full Canopy) and third (Senescence) phases. Chapter 6 focuses on evaluation of the financial costs and throughput associated with UAVs; with specific focus on comparison to conventional methods in a real-world phenotyping scenario. A ‘cost throughput’ analysis based on real-world experiments at Rothamsted Research, provided quantitative assessment demonstrating both the financial savings (ÂŁ4.11 per plot savings) and superior throughput obtained (229% faster) from implementing a UAV based phenotyping strategy to long term phenotyping of field-based experiments. Overall the methods and tools developed in this PhD thesis demonstrate UAVs combined with appropriate remote sensing tools can replicate and even surpass the precision, accuracy, cost and throughput of current strategies

    CGIAR Platform on Genetic Gains

    Get PDF
    Tools and services to accelerate genetic gains of breeding programs targeting the developing world. The Proposal in its current form was developed with contributions by the following institutions. In the next months, it will be circulated much more widely to the public and private sector to attract wider intellectual contributions to a common agenda

    Plant phenomics, from sensors to knowledge

    Get PDF
    Major improvements in crop yield are needed to keep pace with population growth and climate change. While plant breeding efforts have greatly benefited from advances in genomics, profiling the crop phenome (i.e., the structure and function of plants) associated with allelic variants and environments remains a major technical bottleneck. Here, we review the conceptual and technical challenges facing plant phenomics. We first discuss how, given plants’ high levels of morphological plasticity, crop phenomics presents distinct challenges compared with studies in animals. Next, we present strategies for multi-scale phenomics, and describe how major improvements in imaging, sensor technologies and data analysis are now making high-throughput root, shoot, whole-plant and canopy phenomic studies possible. We then suggest that research in this area is entering a new stage of development, in which phenomic pipelines can help researchers transform large numbers of images and sensor data into knowledge, necessitating novel methods of data handling and modelling. Collectively, these innovations are helping accelerate the selection of the next generation of crops more sustainable and resilient to climate change, and whose benefits promise to scale from physiology to breeding and to deliver real world impact for ongoing global food security efforts
    • 

    corecore