16,868 research outputs found
Integration of Biological Sources: Exploring the Case of Protein Homology
Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration
Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm
Peer reviewedPublisher PD
Hot-spot analysis for drug discovery targeting protein-protein interactions
Introduction: Protein-protein interactions are important for biological processes and pathological situations, and are attractive targets for drug discovery. However, rational drug design targeting protein-protein interactions is still highly challenging. Hot-spot residues are seen as the best option to target such interactions, but their identification requires detailed structural and energetic characterization, which is only available for a tiny fraction of protein interactions.
Areas covered: In this review, the authors cover a variety of computational methods that have been reported for the energetic analysis of protein-protein interfaces in search of hot-spots, and the structural modeling of protein-protein complexes by docking. This can help to rationalize the discovery of small-molecule inhibitors of protein-protein interfaces of therapeutic interest. Computational analysis and docking can help to locate the interface, molecular dynamics can be used to find suitable cavities, and hot-spot predictions can focus the search for inhibitors of protein-protein interactions.
Expert opinion: A major difficulty for applying rational drug design methods to protein-protein interactions is that in the majority of cases the complex structure is not available. Fortunately, computational docking can complement experimental data. An interesting aspect to explore in the future is the integration of these strategies for targeting PPIs with large-scale mutational analysis.This work has been funded by grants BIO2016-79930-R and SEV-2015-0493 from the Spanish Ministry of Economy, Industry and Competitiveness, and grant EFA086/15 from EU Interreg V POCTEFA. M Rosell is supported by an FPI fellowship from the Severo Ochoa program. The authors are grateful for the support of the the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft
Data-based fault detection in chemical processes: Managing records with operator intervention and uncertain labels
Developing data-driven fault detection systems for chemical plants requires managing uncertain data labels and dynamic attributes due to operator-process interactions. Mislabeled data is a known problem in computer science that has received scarce attention from the process systems community. This work introduces and examines the effects of operator actions in records and labels, and the consequences in the development of detection models. Using a state space model, this work proposes an iterative relabeling scheme for retraining classifiers that continuously refines dynamic attributes and labels. Three case studies are presented: a reactor as a motivating example, flooding in a simulated de-Butanizer column, as a complex case, and foaming in an absorber as an industrial challenge. For the first case, detection accuracy is shown to increase by 14% while operating costs are reduced by 20%. Moreover, regarding the de-Butanizer column, the performance of the proposed strategy is shown to be 10% higher than the filtering strategy. Promising results are finally reported in regard of efficient strategies to deal with the presented problemPeer ReviewedPostprint (author's final draft
Emergence of switch-like behavior in a large family of simple biochemical networks
Bistability plays a central role in the gene regulatory networks (GRNs)
controlling many essential biological functions, including cellular
differentiation and cell cycle control. However, establishing the network
topologies that can exhibit bistability remains a challenge, in part due to the
exceedingly large variety of GRNs that exist for even a small number of
components. We begin to address this problem by employing chemical reaction
network theory in a comprehensive in silico survey to determine the capacity
for bistability of more than 40,000 simple networks that can be formed by two
transcription factor-coding genes and their associated proteins (assuming only
the most elementary biochemical processes). We find that there exist reaction
rate constants leading to bistability in ~90% of these GRN models, including
several circuits that do not contain any of the TF cooperativity commonly
associated with bistable systems, and the majority of which could only be
identified as bistable through an original subnetwork-based analysis. A
topological sorting of the two-gene family of networks based on the presence or
absence of biochemical reactions reveals eleven minimal bistable networks
(i.e., bistable networks that do not contain within them a smaller bistable
subnetwork). The large number of previously unknown bistable network topologies
suggests that the capacity for switch-like behavior in GRNs arises with
relative ease and is not easily lost through network evolution. To highlight
the relevance of the systematic application of CRNT to bistable network
identification in real biological systems, we integrated publicly available
protein-protein interaction, protein-DNA interaction, and gene expression data
from Saccharomyces cerevisiae, and identified several GRNs predicted to behave
in a bistable fashion.Comment: accepted to PLoS Computational Biolog
Cronobacter, the emergent bacterial pathogen Enterobacter sakazakii comes of age; MLST and whole genome sequence analysis
Background: Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/ cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Results: Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applyingthe 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. Conclusions: The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
Learning and Designing Stochastic Processes from Logical Constraints
Stochastic processes offer a flexible mathematical formalism to model and
reason about systems. Most analysis tools, however, start from the premises
that models are fully specified, so that any parameters controlling the
system's dynamics must be known exactly. As this is seldom the case, many
methods have been devised over the last decade to infer (learn) such parameters
from observations of the state of the system. In this paper, we depart from
this approach by assuming that our observations are {\it qualitative}
properties encoded as satisfaction of linear temporal logic formulae, as
opposed to quantitative observations of the state of the system. An important
feature of this approach is that it unifies naturally the system identification
and the system design problems, where the properties, instead of observations,
represent requirements to be satisfied. We develop a principled statistical
estimation procedure based on maximising the likelihood of the system's
parameters, using recent ideas from statistical machine learning. We
demonstrate the efficacy and broad applicability of our method on a range of
simple but non-trivial examples, including rumour spreading in social networks
and hybrid models of gene regulation
Exploiting network topology for large-scale inference of nonlinear reaction models
The development of chemical reaction models aids understanding and prediction
in areas ranging from biology to electrochemistry and combustion. A systematic
approach to building reaction network models uses observational data not only
to estimate unknown parameters, but also to learn model structure. Bayesian
inference provides a natural approach to this data-driven construction of
models. Yet traditional Bayesian model inference methodologies that numerically
evaluate the evidence for each model are often infeasible for nonlinear
reaction network inference, as the number of plausible models can be
combinatorially large. Alternative approaches based on model-space sampling can
enable large-scale network inference, but their realization presents many
challenges. In this paper, we present new computational methods that make
large-scale nonlinear network inference tractable. First, we exploit the
topology of networks describing potential interactions among chemical species
to design improved "between-model" proposals for reversible-jump Markov chain
Monte Carlo. Second, we introduce a sensitivity-based determination of move
types which, when combined with network-aware proposals, yields significant
additional gains in sampling performance. These algorithms are demonstrated on
inference problems drawn from systems biology, with nonlinear differential
equation models of species interactions
Engineering simulations for cancer systems biology
Computer simulation can be used to inform in vivo and in vitro experimentation, enabling rapid, low-cost hypothesis generation and directing experimental design in order to test those hypotheses. In this way, in silico models become a scientific instrument for investigation, and so should be developed to high standards, be carefully calibrated and their findings presented in such that they may be reproduced. Here, we outline a framework that supports developing simulations as scientific instruments, and we select cancer systems biology as an exemplar domain, with a particular focus on cellular signalling models. We consider the challenges of lack of data, incomplete knowledge and modelling in the context of a rapidly changing knowledge base. Our framework comprises a process to clearly separate scientific and engineering concerns in model and simulation development, and an argumentation approach to documenting models for rigorous way of recording assumptions and knowledge gaps. We propose interactive, dynamic visualisation tools to enable the biological community to interact with cellular signalling models directly for experimental design. There is a mismatch in scale between these cellular models and tissue structures that are affected by tumours, and bridging this gap requires substantial computational resource. We present concurrent programming as a technology to link scales without losing important details through model simplification. We discuss the value of combining this technology, interactive visualisation, argumentation and model separation to support development of multi-scale models that represent biologically plausible cells arranged in biologically plausible structures that model cell behaviour, interactions and response to therapeutic interventions
- …