941 research outputs found

    Deep generative models for biology: represent, predict, design

    Get PDF
    Deep generative models have revolutionized the field of artificial intelligence, fundamentally changing how we generate novel objects that imitate or extrapolate from training data, and transforming how we access and consume various types of information such as texts, images, speech, and computer programs. They have the potential to radically transform other scientific disciplines, ranging from mathematical problem solving, to supporting fast and accurate simulations in high-energy physics or enabling rapid weather forecasting. In computational biology, generative models hold immense promise for improving our understanding of complex biological processes, designing new drugs and therapies, and forecasting viral evolution during pandemics, among many other applications. Biological objects pose however unique challenges due to their inherent complexity, encompassing massive spaces, multiple complementary data modalities, and a unique interplay between highly structured and relatively unstructured components. In this thesis, we develop several deep generative modeling frameworks that are motivated by key questions in computational biology. Given the interdisciplinary nature of this endeavor, we first provide a comprehensive background in generative modeling, uncertainty quantification, sequential decision making, as well as important concepts in biology and chemistry to facilitate a thorough understanding of our work. We then deep dive into the core of our contributions, which are structured around three chapters. The first chapter introduces methods for learning representations of biological sequences, laying the foundation for subsequent analyses. The second chapter illustrates how these representations can be leveraged to predict complex properties of biomolecules, focusing on three specific applications: protein fitness prediction, the effects of genetic variations on human disease risk and viral immune escape. Finally, the third chapter is dedicated to methods for designing novel biomolecules, including drug target identification, de novo molecular optimization, and protein engineering. This thesis also makes several methodological contributions to broader machine learning challenges, such as uncertainty quantification in high-dimensional spaces or efficient transformer architectures, which hold potential value in other application domains. We conclude by summarizing our key findings, highlighting shortcomings of current approaches, proposing potential avenues for future research, and discussing emerging trends within the field

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and a rising incidence rate in the developed world. Molecular profiling techniques have been used to better understand the variability between tumors and disease models such as cell lines. To maximize the translatability and clinical relevance of in vitro studies, the selection of optimal cancer models is imperative. We have developed a deep learning-based method to measure the similarity between CRC tumors and disease models such as cancer cell lines. Our method efficiently leverages multiomics data sets containing copy number alterations, gene expression, and point mutations and learns latent factors that describe data in lower dimensions. These latent factors represent the patterns that are clinically relevant and explain the variability of molecular profiles across tumors and cell lines. Using these, we propose refined CRC subtypes and provide best-matching cell lines to different subtypes. These findings are relevant to patient stratification and selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification

    Enhanced Deep Network Designs Using Mitochondrial DNA Based Genetic Algorithm And Importance Sampling

    Get PDF
    Machine learning (ML) is playing an increasingly important role in our lives. It has already made huge impact in areas such as cancer diagnosis, precision medicine, self-driving cars, natural disasters predictions, speech recognition, etc. The painstakingly handcrafted feature extractors used in the traditional learning, classification and pattern recognition systems are not scalable for large-sized datasets or adaptable to different classes of problems or domains. Machine learning resurgence in the form of Deep Learning (DL) in the last decade after multiple AI (artificial intelligence) winters and hype cycles is a result of the convergence of advancements in training algorithms, availability of massive data (big data) and innovation in compute resources (GPUs and cloud). If we want to solve more complex problems with machine learning, we need to optimize all three of these areas, i.e., algorithms, dataset and compute. Our dissertation research work presents the original application of nature-inspired idea of mitochondrial DNA (mtDNA) to improve deep learning network design. Additional fine-tuning is provided with Monte Carlo based method called importance sampling (IS). The primary performance indicators for machine learning are model accuracy, loss and training time. The goal of our dissertation is to provide a framework to address all these areas by optimizing network designs (in the form of hyperparameter optimization) and dataset using enhanced Genetic Algorithm (GA) and importance sampling. Algorithms are by far the most important aspect of machine learning. We demonstrate the application of mitochondrial DNA to complement the standard genetic algorithm for architecture optimization of deep Convolution Neural Network (CNN). We use importance sampling to reduce the dataset variance and sample more often from the instances that add greater value from the training outcome perspective. And finally, we leverage massive parallel and distributed processing of GPUs in the cloud to speed up training. Thus, our multi-approach method for enhancing deep learning combines architecture optimization, dataset optimization and the power of the cloud to drive better model accuracy and reduce training time

    Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

    Full text link
    Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and rising incidence rate in the developed world. Molecular profiling techniques have been used to study the variability between tumours as well as cancer models such as cell lines, but their translational value is incomplete with current methods. Moreover, first generation computational methods for subtype classification do not make use of multi-omics data in full scale. Drug discovery programs use cell lines as a proxy for human cancers to characterize their molecular makeup and drug response, identify relevant indications and discover biomarkers. In order to maximize the translatability and the clinical relevance of in vitro studies, selection of optimal cancer models is imperative. We present a novel subtype classification method based on deep learning and apply it to classify CRC tumors using multi-omics data, and further to measure the similarity between tumors and disease models such as cancer cell lines. Multi-omics Autoencoder Integration (maui) efficiently leverages data sets containing copy number alterations, gene expression, and point mutations, and learns clinically important patterns (latent factors) across these data types. Using these latent factors, we propose a refinement of the gold-standard CRC subtypes, and propose best-matching cell lines for the different subtypes. These findings are relevant for patient stratification and selection of cell lines for drug discovery pipelines, biomarker discovery, and target identification

    Computation in Complex Networks

    Get PDF
    Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin

    Personalized data analytics for internet-of-things-based health monitoring

    Get PDF
    The Internet-of-Things (IoT) has great potential to fundamentally alter the delivery of modern healthcare, enabling healthcare solutions outside the limits of conventional clinical settings. It can offer ubiquitous monitoring to at-risk population groups and allow diagnostic care, preventive care, and early intervention in everyday life. These services can have profound impacts on many aspects of health and well-being. However, this field is still at an infancy stage, and the use of IoT-based systems in real-world healthcare applications introduces new challenges. Healthcare applications necessitate satisfactory quality attributes such as reliability and accuracy due to their mission-critical nature, while at the same time, IoT-based systems mostly operate over constrained shared sensing, communication, and computing resources. There is a need to investigate this synergy between the IoT technologies and healthcare applications from a user-centered perspective. Such a study should examine the role and requirements of IoT-based systems in real-world health monitoring applications. Moreover, conventional computing architecture and data analytic approaches introduced for IoT systems are insufficient when used to target health and well-being purposes, as they are unable to overcome the limitations of IoT systems while fulfilling the needs of healthcare applications. This thesis aims to address these issues by proposing an intelligent use of data and computing resources in IoT-based systems, which can lead to a high-level performance and satisfy the stringent requirements. For this purpose, this thesis first delves into the state-of-the-art IoT-enabled healthcare systems proposed for in-home and in-hospital monitoring. The findings are analyzed and categorized into different domains from a user-centered perspective. The selection of home-based applications is focused on the monitoring of the elderly who require more remote care and support compared to other groups of people. In contrast, the hospital-based applications include the role of existing IoT in patient monitoring and hospital management systems. Then, the objectives and requirements of each domain are investigated and discussed. This thesis proposes personalized data analytic approaches to fulfill the requirements and meet the objectives of IoT-based healthcare systems. In this regard, a new computing architecture is introduced, using computing resources in different layers of IoT to provide a high level of availability and accuracy for healthcare services. This architecture allows the hierarchical partitioning of machine learning algorithms in these systems and enables an adaptive system behavior with respect to the user's condition. In addition, personalized data fusion and modeling techniques are presented, exploiting multivariate and longitudinal data in IoT systems to improve the quality attributes of healthcare applications. First, a real-time missing data resilient decision-making technique is proposed for health monitoring systems. The technique tailors various data resources in IoT systems to accurately estimate health decisions despite missing data in the monitoring. Second, a personalized model is presented, enabling variations and event detection in long-term monitoring systems. The model evaluates the sleep quality of users according to their own historical data. Finally, the performance of the computing architecture and the techniques are evaluated in this thesis using two case studies. The first case study consists of real-time arrhythmia detection in electrocardiography signals collected from patients suffering from cardiovascular diseases. The second case study is continuous maternal health monitoring during pregnancy and postpartum. It includes a real human subject trial carried out with twenty pregnant women for seven months

    Optimizing energy performance of building renovation using traditional and machine learning approaches

    Get PDF
    International Energy Agency (IEA) studies show that buildings are responsible for more than 30% of the total energy consumption and an equally large amount of related greenhouse gas emissions. Improving the energy performance of buildings is a critical element of building energy conservation. Furthermore, renovating existing buildings envelopes and systems offers significant opportunities for reducing Life-Cycle cost (LCC) and minimizing negative environmental impacts. This approach can be considered as one of the key strategies for achieving sustainable development goals at a relatively low cost, especially when compared with the demolition and reconstruction of new buildings. One of the main methodological and technical issues of this approach is selecting a desirable renovation strategy among a wide range of available options. The main motivation behind this research relies on trying to bridge the gap between building simulation, optimization algorithms, and Artificial Intelligence (AI) techniques, to take full advantage of the value of their couplings. Furthermore, for a whole building simulation and optimization, current simulation-based optimization models, often need thousands of simulation evaluations. Therefore, the optimization becomes unfeasible because of the computation time and complexity of the dependent parameters. To this end, one feasible technique to solve this problem is to implement surrogate models to computationally imitate expensive real building simulation models. The aim of this research is three-fold: (1) to propose a Simulation-Based Multi-Objective Optimization (SBMO) model for optimizing the selection of renovation scenarios for existing buildings by minimizing Total Energy Consumption (TEC), LCC and negative environmental impacts considering Life-Cycle Assessment (LCA); (2) to develop surrogate Artificial Neural Networks (ANNs) for selecting near-optimal building energy renovation methods; and (3) to develop generative deep Machine Learning Models (MLMs) to generate renovation scenarios considering TEC and LCC. This study considers three main areas of building renovation, which are the building envelope, Heating, Ventilation and Air-Conditioning (HVAC) system, and lighting system; each of which has a significant impact on building energy performance. On this premise, this research initially develops a framework for data collection and preparation to define the renovation strategies and proposes a comprehensive database including different renovation methods. Using this database, different renovation scenarios can be compared to find the near-optimal scenario based on the renovation strategy. Each scenario is created from the combination of several methods within the applicable strategy. The SBMO model simulates the process of renovating buildings by using the renovation data in energy analysis software to analyze TEC, LCC, and LCA and identifies the near-optimal renovation scenarios based on the selected renovation methods. Furthermore, an LCA tool is used to evaluate the environmental sustainability of the final decision. It is found that, although the proposed SBMO is accurate, the process of simulation is time consuming. To this end, the second objective focuses on developing robust MLMs to explore vast and complex data generated from the SBMO model and develop a surrogate building energy model to predict TEC, LCC, and LCA for all building renovation scenarios. The main advantage of these MLMs is improving the computing time while achieving acceptable accuracy. More specifically, the second developed model integrates the optimization power of SBMO with the modeling capability of ANNs. While, the proposed ANNs are found to provide satisfactory approximation to the SBMO model in a very short period of time, they do not have the capability to generate renovation scenarios. Finally, the third objective focuses on developing a generative deep learning building energy model using Variational Autoencoders (VAEs). The proposed semi-supervised VAEs extract deep features from a whole building renovation dataset and generate renovation scenarios considering TEC and LCC of existing institutional buildings. The proposed model also has the generalization ability due to its potential to reuse the dataset from a specific case in similar situations. The proposed models will potentially offer new venues in two directions: (1) to predict TEC, LCC, and LCA for different renovation scenarios, and select the near-optimal scenario, and (2) to generate renovation scenarios considering TEC and LCC. Architects and engineers can see the effects of different materials, HVAC systems, etc., on the energy consumption, and make necessary changes to increase the energy performance of the building. The proposed models encourage the implementation of sustainable materials and components to decrease negative environmental impacts. The ultimate impact of the practical implementation of this research is significant savings in buildings’ energy consumption and having more environmentally friendly buildings within the predefined renovation budget
    • …
    corecore