3,304 research outputs found

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Freshwater ecosystem services in mining regions : modelling options for policy development support

    Get PDF
    The ecosystem services (ES) approach offers an integrated perspective of social-ecological systems, suitable for holistic assessments of mining impacts. Yet for ES models to be policy-relevant, methodological consensus in mining contexts is needed. We review articles assessing ES in mining areas focusing on freshwater components and policy support potential. Twenty-six articles were analysed concerning (i) methodological complexity (data types, number of parameters, processes and ecosystem-human integration level) and (ii) potential applicability for policy development (communication of uncertainties, scenario simulation, stakeholder participation and management recommendations). Articles illustrate mining impacts on ES through valuation exercises mostly. However, the lack of ground-and surface-water measurements, as well as insufficient representation of the connectivity among soil, water and humans, leave room for improvements. Inclusion of mining-specific environmental stressors models, increasing resolution of topographies, determination of baseline ES patterns and inclusion of multi-stakeholder perspectives are advantageous for policy support. We argue that achieving more holistic assessments exhorts practitioners to aim for high social-ecological connectivity using mechanistic models where possible and using inductive methods only where necessary. Due to data constraints, cause-effect networks might be the most feasible and best solution. Thus, a policy-oriented framework is proposed, in which data science is directed to environmental modelling for analysis of mining impacts on water ES

    A survey on pre-processing techniques: relevant issues in the context of environmental data mining

    Get PDF
    One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

    Fundamental Concepts of Cyber Resilience: Introduction and Overview

    Full text link
    Given the rapid evolution of threats to cyber systems, new management approaches are needed that address risk across all interdependent domains (i.e., physical, information, cognitive, and social) of cyber systems. Further, the traditional approach of hardening of cyber systems against identified threats has proven to be impossible. Therefore, in the same way that biological systems develop immunity as a way to respond to infections and other attacks, so too must cyber systems adapt to ever-changing threats that continue to attack vital system functions, and to bounce back from the effects of the attacks. Here, we explain the basic concepts of resilience in the context of systems, discuss related properties, and make business case of cyber resilience. We also offer a brief summary of ways to assess cyber resilience of a system, and approaches to improving cyber resilience.Comment: This is a preprint version of a chapter that appears in the book "Cyber Resilience of Systems and Networks," Springer 201

    Early-Warning Monitoring Systems for Improved Drinking Water Resource Protection

    Get PDF

    EG-ICE 2021 Workshop on Intelligent Computing in Engineering

    Get PDF
    The 28th EG-ICE International Workshop 2021 brings together international experts working at the interface between advanced computing and modern engineering challenges. Many engineering tasks require open-world resolutions to support multi-actor collaboration, coping with approximate models, providing effective engineer-computer interaction, search in multi-dimensional solution spaces, accommodating uncertainty, including specialist domain knowledge, performing sensor-data interpretation and dealing with incomplete knowledge. While results from computer science provide much initial support for resolution, adaptation is unavoidable and most importantly, feedback from addressing engineering challenges drives fundamental computer-science research. Competence and knowledge transfer goes both ways
    • …
    corecore