2,473 research outputs found
"Going back to our roots": second generation biocomputing
Researchers in the field of biocomputing have, for many years, successfully
"harvested and exploited" the natural world for inspiration in developing
systems that are robust, adaptable and capable of generating novel and even
"creative" solutions to human-defined problems. However, in this position paper
we argue that the time has now come for a reassessment of how we exploit
biology to generate new computational systems. Previous solutions (the "first
generation" of biocomputing techniques), whilst reasonably effective, are crude
analogues of actual biological systems. We believe that a new, inherently
inter-disciplinary approach is needed for the development of the emerging
"second generation" of bio-inspired methods. This new modus operandi will
require much closer interaction between the engineering and life sciences
communities, as well as a bidirectional flow of concepts, applications and
expertise. We support our argument by examining, in this new light, three
existing areas of biocomputing (genetic programming, artificial immune systems
and evolvable hardware), as well as an emerging area (natural genetic
engineering) which may provide useful pointers as to the way forward.Comment: Submitted to the International Journal of Unconventional Computin
Recommended from our members
OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.
BackgroundOptical mapping is an emerging technology that complements sequencing-based methods in genome analysis. It is widely used in improving genome assemblies and detecting structural variations by providing information over much longer (up to 1 Mb) reads. Current standards in optical mapping analysis involve assembling optical maps into contigs and aligning them to a reference, which is limited to pairwise comparison and becomes bias-prone when analyzing multiple samples.FindingsWe present a new method, OMMA, that extends optical mapping to the study of complex genomic features by simultaneously interrogating optical maps across many samples in a reference-independent manner. OMMA captures and characterizes complex genomic features, e.g., multiple haplotypes, copy number variations, and subtelomeric structures when applied to 154 human samples across the 26 populations sequenced in the 1000 Genomes Project. For small genomes such as pathogenic bacteria, OMMA accurately reconstructs the phylogenomic relationships and identifies functional elements across 21 Acinetobacter baumannii strains.ConclusionsWith the increasing data throughput of optical mapping system, the use of this technology in comparative genome analysis across many samples will become feasible. OMMA is a timely solution that can address such computational need. The OMMA software is available at https://github.com/TF-Chan-Lab/OMTools
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution
Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context
A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks
Evolutionary rewiring of regulatory networks is an important source of diversity among species. Previous evidence suggested substantial divergence of regulatory networks across species. However, systematically assessing the extent of this plasticity and its functional implications has been challenging due to limited experimental data and the noisy nature of computational predictions. Here, we introduce a novel approach to study cis-regulatory evolution, and use it to trace the regulatory history of 88 DNA motifs of transcription factors across 23 Ascomycota fungi. While motifs are conserved, we find a pervasive gain and loss in the regulation of their target genes. Despite this turnover, the biological processes associated with a motif are generally conserved. We explain these trends using a model with a strong selection to conserve the overall function of a transcription factor, and a much weaker selection over the specific genes it targets. The model also accounts for the turnover of bound targets measured experimentally across species in yeasts and mammals. Thus, selective pressures on regulatory networks mostly tolerate local rewiring, and may allow for subtle fine-tuning of gene regulation during evolution
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Blueprint: descrição da complexidade da regulação metabólica através da reconstrução de modelos metabólicos e regulatórios integrados
Tese de doutoramento em Biomedical EngineeringUm modelo metabólico consegue prever o fenótipo de um organismo. No entanto, estes modelos
podem obter previsões incorretas, pois alguns processos metabólicos são controlados por mecanismos
reguladores. Assim, várias metodologias foram desenvolvidas para melhorar os modelos metabólicos
através da integração de redes regulatórias. Todavia, a reconstrução de modelos regulatórios e metabólicos à escala genómica para diversos organismos apresenta diversos desafios.
Neste trabalho, propõe-se o desenvolvimento de diversas ferramentas para a reconstrução e análise
de modelos metabólicos e regulatórios à escala genómica. Em primeiro lugar, descreve-se o Biological
networks constraint-based In Silico Optimization (BioISO), uma nova ferramenta para auxiliar a curação
manual de modelos metabólicos. O BioISO usa um algoritmo de relação recursiva para orientar as previsões de fenótipo. Assim, esta ferramenta pode reduzir o número de artefatos em modelos metabólicos,
diminuindo a possibilidade de obter erros durante a fase de curação.
Na segunda parte deste trabalho, desenvolveu-se um repositório de redes regulatórias para procariontes que permite suportar a sua integração em modelos metabólicos. O Prokaryotic Transcriptional
Regulatory Network Database (ProTReND) inclui diversas ferramentas para extrair e processar informação regulatória de recursos externos. Esta ferramenta contém um sistema de integração de dados que
converte dados dispersos de regulação em redes regulatórias integradas. Além disso, o ProTReND dispõe
de uma aplicação que permite o acesso total aos dados regulatórios.
Finalmente, desenvolveu-se uma ferramenta computacional no MEWpy para simular e analisar modelos regulatórios e metabólicos. Esta ferramenta permite ler um modelo metabólico e/ou rede regulatória,
em diversos formatos. Esta estrutura consegue construir um modelo regulatório e metabólico integrado
usando as interações regulatórias e as ligações entre genes e proteínas codificadas no modelo metabólico e na rede regulatória. Além disso, esta estrutura suporta vários métodos de previsão de fenótipo
implementados especificamente para a análise de modelos regulatórios-metabólicos.Genome-Scale Metabolic (GEM) models can predict the phenotypic behavior of organisms. However,
these models can lead to incorrect predictions, as certain metabolic processes are controlled by regulatory
mechanisms. Accordingly, many methodologies have been developed to extend the reconstruction and
analysis of GEM models via the integration of Transcriptional Regulatory Network (TRN)s. Nevertheless,
the perspective of reconstructing integrated genome-scale regulatory and metabolic models for diverse
prokaryotes is still an open challenge.
In this work, we propose several tools to assist the reconstruction and analysis of regulatory and
metabolic models. We start by describing BioISO, a novel tool to assist the manual curation of GEM
models. BioISO uses a recursive relation-like algorithm and Flux Balance Analysis (FBA) to evaluate and
guide debugging of in silico phenotype predictions. Hence, this tool can reduce the number of artifacts in
GEM models, decreasing the burdens of model refinement and curation.
A state-of-the-art repository of TRNs for prokaryotes was implemented to support the reconstruction
and integration of TRNs into GEM models. The ProTReND repository comprehends several tools to extract
and process regulatory information available in several resources. More importantly, this repository contains a data integration system to unify the regulatory data into standardized TRNs at the genome scale.
In addition, ProTReND contains a web application with full access to the regulatory data.
Finally, we have developed a new modeling framework to define, simulate and analyze GEnome-scale
Regulatory and Metabolic (GERM) models in MEWpy. The GERM model framework can read a GEM
model, as well as a TRN from different file formats. This framework assembles a GERM model using
the regulatory interactions and Genes-Proteins-Reactions (GPR) rules encoded into the GEM model and
TRN. In addition, this modeling framework supports several methods of phenotype prediction designed
for regulatory-metabolic models.I would like to thank Fundação para a Ciência e Tecnologia for the Ph.D. studentship I was awarded
with (SFRH/BD/139198/2018)
Effective Side Effect Machines for Decoding
The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur when using error correcting codes in biological applications. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes.Several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding SEMs for edit metric decoding. Direct and fuzzy classification methods are compared while also changing some of the EP settings to observe how decoding accuracy is affected. Regardless of code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. The accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the SEMs are examined for potential bloat by comparing the number of reachable states against the total number of states. Bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters, with the most consistent trend being that, when allowed, the longer codes generally show a propensity for larger machines
- …