6,989 research outputs found

    Evolution Strategies for Learning Sparse Matrix Representations of Gene Regulatory Networks

    Get PDF
    Currently, a massive amount of temporal gene expression data is available to researchers, which makes it possible to infer Gene Regulatory Networks (GRNs). Gene regulatory networks are theoretical models to represent excitatory and inhibitory interactions between genes. GRNs are useful in understanding how genes function, and hence they are also useful in pharmaceutical and other applications in biology and medicine. However, despite the importance of GRNs, the process of inferring GRNs from observational data is very difficult. This thesis applies evolutionary algorithms to the problem of GRN inference. We propose a novel evolutionary algorithm: hierarchical evolution strategy (HES) to target the specific difficulties in GRN inference. We propose a sparse matrix representation of GRN to account for sparse connectivity in biological gene interactions. Unlike traditional evolution strategies, we divide our optimization into two concurrent processes: connectivity construction and numerical optimization. In each generation, we first establish connectivity structure of the GRN. Inside the same generation, we apply a secondary ES to find the best numerical values with those fixed connections. We also propose a hybrid crowding method to maintain high population diversity while applying the evolutionary algorithms. High population diversity leads to broader exploration area in the search space, therefore preventing premature convergence. The results obtained show that the proposed HES outperforms other algorithms, and has the potential to scale up to realistic problems with thousands of genes

    Comparative study of three commonly used continuous deterministic methods for modeling gene regulation networks

    Get PDF
    BACKGROUND: A gene-regulatory network (GRN) refers to DNA segments that interact through their RNA and protein products and thereby govern the rates at which genes are transcribed. Creating accurate dynamic models of GRNs is gaining importance in biomedical research and development. To improve our understanding of continuous deterministic modeling methods employed to construct dynamic GRN models, we have carried out a comprehensive comparative study of three commonly used systems of ordinary differential equations: The S-system (SS), artificial neural networks (ANNs), and the general rate law of transcription (GRLOT) method. These were thoroughly evaluated in terms of their ability to replicate the reference models' regulatory structure and dynamic gene expression behavior under varying conditions. RESULTS: While the ANN and GRLOT methods appeared to produce robust models even when the model parameters deviated considerably from those of the reference models, SS-based models exhibited a notable loss of performance even when the parameters of the reverse-engineered models corresponded closely to those of the reference models: this is due to the high number of power terms in the SS-method, and the manner in which they are combined. In cross-method reverse-engineering experiments the different characteristics, biases and idiosynchracies of the methods were revealed. Based on limited training data, with only one experimental condition, all methods produced dynamic models that were able to reproduce the training data accurately. However, an accurate reproduction of regulatory network features was only possible with training data originating from multiple experiments under varying conditions. CONCLUSIONS: The studied GRN modeling methods produced dynamic GRN models exhibiting marked differences in their ability to replicate the reference models' structure and behavior. Our results suggest that care should be taking when a method is chosen for a particular application. In particular, reliance on only a single method might unduly bias the results

    Big Data Optimization : Algorithmic Framework for Data Analysis Guided by Semantics

    Get PDF
    Fecha de Lectura de Tesis: 9 noviembre 2018.Over the past decade the rapid rise of creating data in all domains of knowledge such as traffic, medicine, social network, industry, etc., has highlighted the need for enhancing the process of analyzing large data volumes, in order to be able to manage them with more easiness and in addition, discover new relationships which are hidden in them Optimization problems, which are commonly found in current industry, are not unrelated to this trend, therefore Multi-Objective Optimization Algorithms (MOA) should bear in mind this new scenario. This means that, MOAs have to deal with problems, which have either various data sources (typically streaming) of huge amount of data. Indeed these features, in particular, are found in Dynamic Multi-Objective Problems (DMOPs), which are related to Big Data optimization problems. Mostly with regards to velocity and variability. When dealing with DMOPs, whenever there exist changes in the environment that affect the solutions of the problem (i.e., the Pareto set, the Pareto front, or both), therefore in the fitness landscape, the optimization algorithm must react to adapt the search to the new features of the problem. Big Data analytics are long and complex processes therefore, with the aim of simplify them, a series of steps are carried out through. A typical analysis is composed of data collection, data manipulation, data analysis and finally result visualization. In the process of creating a Big Data workflow the analyst should bear in mind the semantics involving the problem domain knowledge and its data. Ontology is the standard way for describing the knowledge about a domain. As a global target of this PhD Thesis, we are interested in investigating the use of the semantic in the process of Big Data analysis, not only focused on machine learning analysis, but also in optimization

    Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella

    Get PDF
    Background: Changes in transcriptional regulation underlie many of the phenotypic differences observed within and between species of bacteria. Lateral genetic transfer (LGT) can significantly impact the transcription factor (TF) genes which drive these transcriptional changes. Although much emphasis has been placed on LGT of intact genes, the units of transfer and recombination do not necessarily correspond to regions delineated by exact gene boundaries. Here we apply phylogenetic and network-based methods to investigate the relationship between units of lateral transfer and recombination within the Escherichia coli - Shigella clade and the topological properties of genes in the E. coli transcriptional regulatory network (TRN)

    Multiobjective optimization in bioinformatics and computational biology

    Get PDF

    Big data analytics in computational biology and bioinformatics

    Get PDF
    Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference. The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures. The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone. Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment

    GENECI: A novel evolutionary machine learning consensus-based approach for the inference of gene regulatory networks

    Get PDF
    Gene regulatory networks define the interactions between DNA products and other substances in cells. Increasing knowledge of these networks improves the level of detail with which the processes that trigger different diseases are described and fosters the development of new therapeutic targets. These networks are usually represented by graphs, and the primary sources for their correct construction are usually time series from differential expression data. The inference of networks from this data type has been approached differently in the literature. Mostly, computational learning techniques have been implemented, which have finally shown some specialization in specific datasets. For this reason, the need arises to create new and more robust strategies for reaching a consensus based on previous results to gain a particular capacity for generalization. This paper presents GENECI (GEne NEtwork Consensus Inference), an evolutionary machine learning approach that acts as an organizer for constructing ensembles to process the results of the main inference techniques reported in the literature and to optimize the consensus network derived from them, according to their confidence levels and topological characteristics. After its design, the proposal was confronted with datasets collected from academic benchmarks (DREAM challenges and IRMA network) to quantify its accuracy. Subsequently, it was applied to a real-world biological network of melanoma patients whose results could be contrasted with medical research collected in the literature. Finally, it has been proved that its ability to optimize the consensus of several networks leads to outstanding robustness and accuracy, gaining a certain generalization capacity after facing the inference of multiple datasetsThis work has been partially funded by grant (funded by MCIN/AEI/10.13039/501100011033/) PID2020-112540RB-C41, AETHER-UMA, Spain (A smart data holistic approach for context-aware data analytics: semantics and context exploitation) and Andalusian PAIDI program, Spain with grant P18-RT-2799. Funding for open access charge: Universidad de Málaga, Spain/CBUA. Adrián SeguraOrtiz is supported by Grant FPU21/03837 (Spanish Ministry of Science, Innovation and Universities, Spain
    corecore