335 research outputs found

    Multi-Fidelity Active Learning with GFlowNets

    Full text link
    In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.Comment: Code: https://github.com/nikita-0209/mf-al-gf

    Common challenges and requirements

    Get PDF
    Research infrastructures available for researchers in environmental and Earth science are diverse and highly distributed; dedicated research infrastructures exist for atmospheric science, marine science, solid Earth science, biodiversity research, and more. These infrastructures aggregate and curate key research datasets and provide consolidated data services for a target research community, but they also often overlap in scope and ambition, sharing data sources, sometimes even sites, using similar standards, and ultimately all contributing data that will be essential to addressing the societal challenges that face environmental research today. Thus, while their diversity poses a problem for open science and multidisciplinary research, their commonalities mean that they often face similar technical problems and consequently have common requirements when addressing the implementation of best practices in curation, cataloguing, identification and citation, and other related core topics for data science. In this chapter, we review the requirements gathering performed in the context of the cluster of European environmental and Earth science research infrastructures participating in the ENVRI community, and survey the common challenges identified from that requirements gathering process

    Transactional Data Structures

    Get PDF

    Applying Secure Multi-party Computation in Practice

    Get PDF
    In this work, we present solutions for technical difficulties in deploying secure multi-party computation in real-world applications. We will first give a brief overview of the current state of the art, bring out several shortcomings and address them. The main contribution of this work is an end-to-end process description of deploying secure multi-party computation for the first large-scale registry-based statistical study on linked databases. Involving large stakeholders like government institutions introduces also some non-technical requirements like signing contracts and negotiating with the Data Protection Agency

    Automatically deriving cost models for structured parallel processes using hylomorphisms

    Get PDF
    This work has been partially supported by the EU Horizon 2020 grant “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (TACLe), supported by COST (European Cooperation on Science and Technology), and by EPSRC grant EP/M027317/1 “C33: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence”.Structured parallelism using nested algorithmic skeletons can greatly ease the task of writing parallel software, since common, but hard-to-debug, problems such as race conditions are eliminated by design. However, choosing the best combination of algorithmic skeletons to yield good parallel speedups for a specific program on a specific parallel architecture is still a difficult problem. This paper uses the unifying notion of hylomorphisms, a general recursion pattern, to make it possible to reason about both the functional correctness properties and the extra-functional timing properties of structured parallel programs. We have previously used hylomorphisms to provide a denotational semantics for skeletons, and proved that a given parallel structure for a program satisfies functional correctness. This paper expands on this theme, providing a simple operational semantics for algorithmic skeletons and a cost semantics that can be automatically derived from that operational semantics. We prove that both semantics are sound with respect to our previously defined denotational semantics. This means that we can now automatically and statically choose a provably optimal parallel structure for a given program with respect to a cost model for a (class of) parallel architecture. By deriving an automatic amortised analysis from our cost model, we can also accurately predict parallel runtimes and speedups.PostprintPeer reviewe

    The application of modified adaptive landscapes to heuristic modelling of engine concept designs using sparse data

    Get PDF
    The automotive internal combustion engine industry operates in a sector that relies on high production volumes for economies of scale, and dedicated production equipment for efficiency of operations and control of quality, yet is subject to the vagaries of a dynamic marketplace, with the need for constant change. These circumstances place pressure on engine designs to be optimised at launch to be competitive and meet market needs, yet be adaptable to uncertain requirements for change over their production life. Engine designers therefore need concept configuration evaluation tools that can assess architectures for resilience to geometric change over the production life of the product. The problem of being resource efficient whilst having the capacity to adapt tochanging environments is one that has been addressed in nature. Natural systems have evolved strategies of satisficing conflicting requirements whilst being resource efficient. The theory of adaptive landscapes helps us to visualise the adaptive capacity of potential morphological forms. A concept attribute analysis methodology based on satisficing and adaptive landscapes has been developed and tested for application to engine concept design. The Plateau, Flooded Adaptive Landscape technique (PFAL),has been evaluated against exemplar engine life histories and shows merit in aiding the decision-making process for concept designers working with sparse data. The process lets the designer visualise the attribute map, enabling them to make better trade-off decisions and share these with non-expert stakeholders to gain their input in final concept choices

    Compiler-driven data layout transformations for network applications

    Get PDF
    This work approaches the little studied topic of compiler optimisations directed to network applications. It starts by investigating if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compiler optimisations. It shows an automated approach that is capable of identifying domain-specific workload characterisations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarise key resource utilisation issues and enable compiler engineers to address the highlighted bottlenecks. By applying this methodology to data intensive network infrastructure application it shows that data organisation is the key obstacle to overcome in order to achieve high performance. It therefore proposes and evaluates three specialised data transformations (structure splitting, array regrouping, and software caching) against the industrial EEMBC networking benchmarks and real-world data sets. It also demonstrates on one hand that speedups of up to 2.62 can be achieved, but on the other that no single solution performs equally well across different network traffic scenarios. Hence, to address this issue, an adaptive software caching scheme for high frequency route lookup operations is introduced and its effectiveness evaluated one more time against EEMBC networking benchmarks and real-world data sets achieving speedups of up to 3.30 and 2.27. The results clearly demonstrate that adaptive data organisation schemes are necessary to ensure optimal performance under varying network loads. Finally this research addresses another issue introduced by data transformations such as array regrouping and software caching, i.e. the need for static analysis to allow efficient resource allocation. This thesis proposes a static code analyser that allows the automatic resource analysis of source code containing lists and tree structures. The tool applies a combination of amortised analysis and separation logic methodology to real code and is able to evaluate type and resource usage of existing data structures, which can be used to compute global resource consumption values for full data intensive network applications

    Transactional data structures

    Get PDF
    Concurrent programming is difficult and the effort is rarely rewarded by faster execution. The concurrency problem arises because information cannot pass instantly between processors resulting in temporal uncertainty. This thesis explores the idea that immutable data and distributed concurrency control can be combined to allow scalable concurrent execution and make concurrent programming easier. A concurrent system that does not impose a global ordering on events lends itself to a scalable distributed implementation. A concurrent programming environment in which the ordering of events affecting an object is enforced locally has intuitive concurrent semantics. This thesis introduces Transactional Data Structures which are data structures that permit access to past versions, although not all accesses succeed. These data structures form the basis of a concurrent programming solution that supports database type transactions in memory. Transactional Data Structures permit non-blocking concurrent access to familiar abstract data types such as deques, maps, vectors and priority queues. Using these data structures a programmer can write a concurrent program in C without having to reason about locks. The solution is evaluated by comparing the performance of a concurrent algorithm to calculate the minimum spanning tree of a graph with that of a similar algorithm which uses Transactional Memory and by comparing a non-blocking Producer Consumer Queue with its blocking counterpart.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Artificial intelligence and model checking methods for in silico clinical trials

    Get PDF
    Model-based approaches to safety and efficacy assessment of pharmacological treatments (In Silico Clinical Trials, ISCT) hold the promise to decrease time and cost for the needed experimentations, reduce the need for animal and human testing, and enable personalised medicine, where treatments tailored for each single patient can be designed before being actually administered. Research in Virtual Physiological Human (VPH) is harvesting such promise by developing quantitative mechanistic models of patient physiology and drugs. Depending on many parameters, such models define physiological differences among different individuals and different reactions to drug administrations. Value assignments to model parameters can be regarded as Virtual Patients (VPs). Thus, as in vivo clinical trials test relevant drugs against suitable candidate patients, ISCT simulate effect of relevant drugs against VPs covering possible behaviours that might occur in vivo. Having a population of VPs representative of the whole spectrum of human patient behaviours is a key enabler of ISCT. However, VPH models of practical relevance are typically too complex to be solved analytically or to be formally analysed. Thus, they are usually solved numerically within simulators. In this setting, Artificial Intelligence and Model Checking methods are typically devised. Indeed, a VP coupled together with a pharmacological treatment represents a closed-loop model where the VP plays the role of a physical subsystem and the treatment strategy plays the role of the control software. Systems with this structure are known as Cyber-Physical Systems (CPSs). Thus, simulation-based methodologies for CPSs can be employed within personalised medicine in order to compute representative VP populations and to conduct ISCT. In this thesis, we advance the state of the art of simulation-based Artificial Intelligence and Model Checking methods for ISCT in the following directions. First, we present a Statistical Model Checking (SMC) methodology based on hypothesis testing that, given a VPH model as input, computes a population of VPs which is representative (i.e., large enough to represent all relevant phenotypes, with a given degree of statistical confidence) and stratified (i.e., organised as a multi-layer hierarchy of homogeneous sub-groups). Stratification allows ISCT to adaptively focus on specific phenotypes, also supporting prioritisation of patient sub-groups in follow-up in vivo clinical trials. Second, resting on a representative VP population, we design an ISCT aiming at optimising a complex treatment for a patient digital twin, that is the virtual counterpart of that patient physiology defined by means of a set of VPs. Our ISCT employs an intelligent search driving a VPH model simulator to seek the lightest but still effective treatment for the input patient digital twin. Third, to enable interoperability among VPH models defined with different modelling and simulation environments and to increase efficiency of our ISCT, we also design an optimised simulator driver to speed-up backtracking-based search algorithms driving simulators. Finally, we evaluate the effectiveness of our presented methodologies on state-of-the-art use cases and validate our results on retrospective clinical data
    corecore