3,168 research outputs found

    Towards Data-Driven Large Scale Scientific Visualization and Exploration

    Get PDF
    Technological advances have enabled us to acquire extremely large datasets but it remains a challenge to store, process, and extract information from them. This dissertation builds upon recent advances in machine learning, visualization, and user interactions to facilitate exploration of large-scale scientific datasets. First, we use data-driven approaches to computationally identify regions of interest in the datasets. Second, we use visual presentation for effective user comprehension. Third, we provide interactions for human users to integrate domain knowledge and semantic information into this exploration process. Our research shows how to extract, visualize, and explore informative regions on very large 2D landscape images, 3D volumetric datasets, high-dimensional volumetric mouse brain datasets with thousands of spatially-mapped gene expression profiles, and geospatial trajectories that evolve over time. The contribution of this dissertation include: (1) We introduce a sliding-window saliency model that discovers regions of user interest in very large images; (2) We develop visual segmentation of intensity-gradient histograms to identify meaningful components from volumetric datasets; (3) We extract boundary surfaces from a wealth of volumetric gene expression mouse brain profiles to personalize the reference brain atlas; (4) We show how to efficiently cluster geospatial trajectories by mapping each sequence of locations to a high-dimensional point with the kernel distance framework. We aim to discover patterns, relationships, and anomalies that would lead to new scientific, engineering, and medical advances. This work represents one of the first steps toward better visual understanding of large-scale scientific data by combining machine learning and human intelligence

    Biological evolution through mutation, selection, and drift: An introductory review

    Full text link
    Motivated by present activities in (statistical) physics directed towards biological evolution, we review the interplay of three evolutionary forces: mutation, selection, and genetic drift. The review addresses itself to physicists and intends to bridge the gap between the biological and the physical literature. We first clarify the terminology and recapitulate the basic models of population genetics, which describe the evolution of the composition of a population under the joint action of the various evolutionary forces. Building on these foundations, we specify the ingredients explicitly, namely, the various mutation models and fitness landscapes. We then review recent developments concerning models of mutational degradation. These predict upper limits for the mutation rate above which mutation can no longer be controlled by selection, the most important phenomena being error thresholds, Muller's ratchet, and mutational meltdowns. Error thresholds are deterministic phenomena, whereas Muller's ratchet requires the stochastic component brought about by finite population size. Mutational meltdowns additionally rely on an explicit model of population dynamics, and describe the extinction of populations. Special emphasis is put on the mutual relationship between these phenomena. Finally, a few connections with the process of molecular evolution are established.Comment: 62 pages, 6 figures, many reference

    Mining for genotype-phenotype relations in Saccharomyces using partial least squares

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.</p> <p>Results</p> <p>Applying this methodology to an extensive data set for the model yeast <it>Saccharomyces cerevisiae</it>, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on <it>Saccharomyces </it>yeasts recent adaptation to environmental changes in its ecological niche.</p> <p>Conclusions</p> <p>BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.</p

    Adaptive algorithms for history matching and uncertainty quantification

    Get PDF
    Numerical reservoir simulation models are the basis for many decisions in regard to predicting, optimising, and improving production performance of oil and gas reservoirs. History matching is required to calibrate models to the dynamic behaviour of the reservoir, due to the existence of uncertainty in model parameters. Finally a set of history matched models are used for reservoir performance prediction and economic and risk assessment of different development scenarios. Various algorithms are employed to search and sample parameter space in history matching and uncertainty quantification problems. The algorithm choice and implementation, as done through a number of control parameters, have a significant impact on effectiveness and efficiency of the algorithm and thus, the quality of results and the speed of the process. This thesis is concerned with investigation, development, and implementation of improved and adaptive algorithms for reservoir history matching and uncertainty quantification problems. A set of evolutionary algorithms are considered and applied to history matching. The shared characteristic of applied algorithms is adaptation by balancing exploration and exploitation of the search space, which can lead to improved convergence and diversity. This includes the use of estimation of distribution algorithms, which implicitly adapt their search mechanism to the characteristics of the problem. Hybridising them with genetic algorithms, multiobjective sorting algorithms, and real-coded, multi-model and multivariate Gaussian-based models can help these algorithms to adapt even more and improve their performance. Finally diversity measures are used to develop an explicit, adaptive algorithm and control the algorithm’s performance, based on the structure of the problem. Uncertainty quantification in a Bayesian framework can be carried out by resampling of the search space using Markov chain Monte-Carlo sampling algorithms. Common critiques of these are low efficiency and their need for control parameter tuning. A Metropolis-Hastings sampling algorithm with an adaptive multivariate Gaussian proposal distribution and a K-nearest neighbour approximation has been developed and applied

    The Energy Landscape Analysis of Cancer Mutations in Protein Kinases

    Get PDF
    The growing interest in quantifying the molecular basis of protein kinase activation and allosteric regulation by cancer mutations has fueled computational studies of allosteric signaling in protein kinases. In the present study, we combined computer simulations and the energy landscape analysis of protein kinases to characterize the interplay between oncogenic mutations and locally frustrated sites as important catalysts of allostetric kinase activation. While structurally rigid kinase core constitutes a minimally frustrated hub of the catalytic domain, locally frustrated residue clusters, whose interaction networks are not energetically optimized, are prone to dynamic modulation and could enable allosteric conformational transitions. The results of this study have shown that the energy landscape effect of oncogenic mutations may be allosteric eliciting global changes in the spatial distribution of highly frustrated residues. We have found that mutation-induced allosteric signaling may involve a dynamic coupling between structurally rigid (minimally frustrated) and plastic (locally frustrated) clusters of residues. The presented study has demonstrated that activation cancer mutations may affect the thermodynamic equilibrium between kinase states by allosterically altering the distribution of locally frustrated sites and increasing the local frustration in the inactive form, while eliminating locally frustrated sites and restoring structural rigidity of the active form. The energy landsape analysis of protein kinases and the proposed role of locally frustrated sites in activation mechanisms may have useful implications for bioinformatics-based screening and detection of functional sites critical for allosteric regulation in complex biomolecular systems

    Towards an AEC-AI Industry Optimization Algorithmic Knowledge Mapping: An Adaptive Methodology for Macroscopic Conceptual Analysis

    Full text link
    [EN] The Architecture, Engineering, and Construction (AEC) Industry is one of the most important productive sectors, hence also produce a high impact on the economic balances, societal stability, and global challenges in climate change. Regarding its adoption of technologies, applications and processes is also recognized by its status-quo, its slow innovation pace, and the conservative approaches. However, a new technological era - Industry 4.0 fueled by AI- is driving productive sectors in a highly pressurized global technological competition and sociopolitical landscape. In this paper, we develop an adaptive approach to mining text content in the literature research corpus related to the AEC and AI (AEC-AI) industries, in particular on its relation to technological processes and applications. We present a rst stage approach to an adaptive assessment of AI algorithms, to form an integrative AI platform in the AEC industry, the AEC-AI industry 4.0. At this stage, a macroscopic adaptive method is deployed to characterize ``Optimization,'' a key term in AEC-AI industry, using a mixed methodology incorporating machine learning and classical evaluation process. Our results show that effective use of metadata, constrained search queries, and domain knowledge allows getting a macroscopic assessment of the target concept. This allows the extraction of a high-level mapping and conceptual structure characterization of the literature corpus. The results are comparable, at this level, to classical methodologies for the literature review. In addition, our method is designed for an adaptive assessment to incorporate further stages.This work was supported by the CONICYT/FONDECYT/INICIACION under Grant 11180056 to Jose Garcia and the Spanish Ministry of Science and Innovation through the FEDER Funding under Project PID2020-117056RB-I00 to Victor Yepes.Maureira, C.; Pinto, H.; Yepes, V.; García, J. (2021). Towards an AEC-AI Industry Optimization Algorithmic Knowledge Mapping: An Adaptive Methodology for Macroscopic Conceptual Analysis. IEEE Access. 9:110842-110879. https://doi.org/10.1109/ACCESS.2021.3102215S110842110879

    Variable fidelity modeling as applied to trajectory optimization for a hydraulic backhoe

    Get PDF
    Modeling, simulation, and optimization play vital roles throughout the engineering design process; however, in many design disciplines the cost of simulation is high, and designers are faced with a tradeoff between the number of alternatives that can be evaluated and the accuracy with which they can be evaluated. In this thesis, a methodology is presented for using models of various levels of fidelity during the optimization process. The intent is to use inexpensive, low-fidelity models with limited accuracy to recognize poor design alternatives and reserve the high-fidelity, accurate, but also expensive models only to characterize the best alternatives. Specifically, by setting a user-defined performance threshold, the optimizer can explore the design space using a low-fidelity model by default, and switch to a higher fidelity model only if the performance threshold is attained. In this manner, the high fidelity model is used only to discern the best solution from the set of good solutions, so that computational resources are conserved until the optimizer is close to the solution. This makes the optimization process more efficient without sacrificing the quality of the solution. The method is illustrated by optimizing the trajectory of a hydraulic backhoe. To characterize the robustness and efficiency of the method, a design space exploration is performed using both the low and high fidelity models, and the optimization problem is solved multiple times using the variable fidelity framework.M.S.Committee Chair: Paredis, Chris; Committee Member: Bras, Bert; Committee Member: Burkhart, Roger; Committee Member: Choi, Seung-Kyu

    Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations

    Get PDF
    The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect
    • …
    corecore