6 research outputs found

    Synthesis of Scientific Workflows: Theory and Practice of an Instance-Aware Approach

    Get PDF
    The last two decades brought an explosion of computational tools and processes in many scientific domains (e.g., life-, social- and geo-science). Scientific workflows, i.e., computational pipelines, accompanied by workflow management systems, were soon adopted as a de-facto standard among non-computer scientists for orchestrating such computational processes. The goal of this dissertation is to provide a framework that would automate the orchestration of such computational pipelines in practice. We refer to such problems as scientific workflow synthesis problems. This dissertation introduces the temporal logic SLTLx, and presents a novel SLTLx-based synthesis approach that overcomes limitations in handling data object dependencies present in existing synthesis approaches. The new approach uses transducers and temporal goals, which keep track of the data objects in the synthesised workflow. The proposed SLTLx-based synthesis includes a bounded and a dynamic variant, which are shown in Chapter 3 to be NP-complete and PSPACE-complete, respectively. Chapter 4 introduces a transformation algorithm that translates the bounded SLTLx-based synthesis problem into propositional logic. The transformation is implemented as part of the APE (Automated Pipeline Explorer) framework, presented in Chapter 5. It relies on highly efficient SAT solving techniques, using an off-the-shelf SAT solver to synthesise a solution for the given propositional encoding. The framework provides an API (application programming interface), a CLI (command line interface), and a web-based GUI (graphical user interface). The development of APE was accompanied by four concrete application scenarios as case studies for automated workflow composition. The studies were conducted in collaboration with domain experts and presented in Chapter 6. Each of the case studies is used to assess and illustrate specific features of the SLTLx-based synthesis approach. (1) A case study on cartographic map generation demonstrates the ability to distinguish data objects as a key feature of the framework. It illustrates the process of annotating a new domain, and presents the iterative workflow synthesis approach, where the user tries to narrow down the desired specification of the problem in a few intuitive steps. (2) A case study on geo-analytical question answering as part of the QuAnGIS project shows the benefits of using data flow dependencies to describe a synthesis problem. (3) A proteomics case study demonstrates the usability of APE as an ā€œoff-the-shelfā€ synthesiser, providing direct integration with existing semantic domain annotations. In addition, a manual evaluation of the synthesised results shows promising results even on large real-life domains, such as the EDAM ontology and the complete bio.tools registry. (4) A geo-event question-answering study demonstrates the usability of APE within a larger question-answering system. This dissertation answers the goals it sets to solve. It provides a formal framework, accompanied by a lightweight library, which can solve real-life scientific workflow synthesis problems. Finally, the development of the library motivated an upcoming collaborative project in the life sciences domain. The aim of the project is to develop a platform which would automatically compose (using APE) and benchmark workflows in computational proteomics

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently ā€“ to become ā€˜smartā€™ and ā€˜sustainableā€™. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ā€˜bigā€™ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently ā€“ to become ā€˜smartā€™ and ā€˜sustainableā€™. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ā€˜bigā€™ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Applications of new forms of data to demographics

    Get PDF
    At the outset, this thesis sets out to address limitations in conventional population data for the representation of stocks and flows of human populations. Until now, many of the data available for studying population behaviour have been static in nature, often collected on an infrequent basis or in an inconsistent manner. However, rapid expansion in the use of online technologies has led to the generation of a huge volume of data as a byproduct of individualsā€™ online activities. This thesis sets out to exploit just one of these new data channels: raw geographically referenced messages collected by the Twitter Online Social Network. The thesis develops a framework for the creation of functional population inventories from Twitter. Through the application of various data mining and heuristic techniques, individual Twitter users are attributed with key demographic markers including age, gender, ethnicity and place of residence. However, while these inventories possess the required data structure for analysis, little is understood about whom they represent and for what purposes they may be reliably employed. Thus a primary focus of this thesis is the assessment of Twitter-based population inventories at a range of spatial scales from the local to the global. More specifically, the assessment considers issues of age, gender, ethnicity, geographic distribution and surname composition. The value of such rich data is demonstrated in the final chapter in which a detailed analysis of the stocks and flows of peoples within the four largest London airports is undertaken. The analysis demonstrates both the extraction of conventional insight, such as passenger statistics and new insights such as footfall and sentiment. The thesis concludes with recommendations for the ways in which social media analysis may be used in demographics to supplement the analysis of populations using conventional sources of data

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently ā€“ to become ā€˜smartā€™ and ā€˜sustainableā€™. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ā€˜bigā€™ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    High-throughput geocomputational workflows in a grid environment

    Get PDF
    A grid-computing platform facilitates geocomputational workflow composition to process big geosciences data while fully using idle resources to accelerate processing speed. An experiment with aerosol optical depth retrieval from satellite data shows a 25 percent improvement in runtime over a single high-performance computer.N/
    corecore