5 research outputs found
Shaping and Dilating the Fitness Landscape for Parameter Estimation in Stochastic Biochemical Models
The parameter estimation (PE) of biochemical reactions is one of the most challenging tasks in systems biology given the pivotal role of these kinetic constants in driving the behavior of biochemical systems. PE is a non-convex, multi-modal, and non-separable optimization problem with an unknown fitness landscape; moreover, the quantities of the biochemical species appearing in the system can be low, making biological noise a non-negligible phenomenon and mandating the use of stochastic simulation. Finally, the values of the kinetic parameters typically follow a log-uniform distribution; thus, the optimal solutions are situated in the lowest orders of magnitude of the search space. In this work, we further elaborate on a novel approach to address the PE problem based on a combination of adaptive swarm intelligence and dilation functions (DFs). DFs require prior knowledge of the characteristics of the fitness landscape; therefore, we leverage an alternative solution to evolve optimal DFs. On top of this approach, we introduce surrogate Fourier modeling to simplify the PE, by producing a smoother version of the fitness landscape that excludes the high frequency components of the fitness function. Our results show that the PE exploiting evolved DFs has a performance comparable with that of the PE run with a custom DF. Moreover, surrogate Fourier modeling allows for improving the convergence speed. Finally, we discuss some open problems related to the scalability of our methodology
Reliable Generation of Native-Like Decoys Limits Predictive Ability in Fragment-Based Protein Structure Prediction
Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions
A Framework for Semantic Similarity Measures to enhance Knowledge Graph Quality
Precisely determining similarity values among real-world entities becomes a building block for data driven tasks, e.g., ranking, relation discovery or integration. Semantic Web and Linked Data initiatives have promoted the publication of large semi-structured datasets in form of knowledge graphs. Knowledge graphs encode semantics that describes resources in terms of several aspects or resource characteristics, e.g., neighbors, class hierarchies or attributes. Existing similarity measures take into account these aspects in isolation, which may prevent them from delivering accurate similarity values. In this thesis, the relevant resource characteristics to determine accurately similarity values are identified and considered in a cumulative way in a framework of four similarity measures. Additionally, the impact of considering these resource characteristics during the computation of similarity values is analyzed in three data-driven tasks for the enhancement of knowledge graph quality.
First, according to the identified resource characteristics, new similarity measures able to combine two or more of them are described. In total four similarity measures are presented in an evolutionary order. While the first three similarity measures, OnSim, IC-OnSim and GADES, combine the resource characteristics according to a human defined aggregation function, the last one, GARUM, makes use of a machine learning regression approach to determine the relevance of each resource characteristic during the computation of the similarity.
Second, the suitability of each measure for real-time applications is studied by means of a theoretical and an empirical comparison. The theoretical comparison consists on a study of the worst case computational complexity of each similarity measure. The empirical comparison is based on the execution times of the different similarity measures in two third-party benchmarks involving the comparison of semantically annotated entities.
Ultimately, the impact of the described similarity measures is shown in three data-driven tasks for the enhancement of knowledge graph quality: relation discovery, dataset integration and evolution analysis of annotation datasets. Empirical results show that relation discovery and dataset integration tasks obtain better results when considering semantics encoded in semantic similarity measures. Further, using semantic similarity measures in the evolution analysis tasks allows for defining new informative metrics able to give an overview of the evolution of the whole annotation set, instead of the individual annotations like state-of-the-art evolution analysis frameworks
Can cyanobacterial diversity in the source predict the diversity in sludge and the risk of toxin release in a drinking water treatment plant?
ABSTRACT: Conventional processes (coagulation, flocculation, sedimentation, and filtration) are widely used in drinking water treatment plants and are considered a good treatment strategy to eliminate cyanobacterial cells and cell-bound cyanotoxins. The diversity of cyanobacteria was investigated using taxonomic cell counts and shotgun metagenomics over two seasons in a drinking water treat- ment plant before, during, and after the bloom. Changes in the community structure over time at the phylum, genus, and species levels were monitored in samples retrieved from raw water (RW), sludge in the holding tank (ST), and sludge supernatant (SST). Aphanothece clathrata brevis, Microcystis aeruginosa, Dolichospermum spiroides, and Chroococcus minimus were predominant species detected in RW by taxonomic cell counts. Shotgun metagenomics revealed that Proteobacteria was the pre- dominant phylum in RW before and after the cyanobacterial bloom. Taxonomic cell counts and shotgun metagenomic showed that the Dolichospermum bloom occurred inside the plant. Cyanobac- teria and Bacteroidetes were the major bacterial phyla during the bloom. Shotgun metagenomics also showed that Synechococcus, Microcystis, and Dolichospermum were the predominant detected cyanobacterial genera in the samples. Conventional treatment removed more than 92% of cyanobac- terial cells but led to cell accumulation in the sludge up to 31 times more than in the RW influx. Coagulation/sedimentation selectively removed more than 96% of Microcystis and Dolichospermum. Cyanobacterial community in the sludge varied from raw water to sludge during sludge storage (1–13 days). This variation was due to the selective removal of coagulation/sedimentation as well as the accumulation of captured cells over the period of storage time. However, the prediction of the cyanobacterial community composition in the SST remained a challenge. Among nutrient parameters, orthophosphate availability was related to community profile in RW samples, whereas communities in ST were influenced by total nitrogen, Kjeldahl nitrogen (N- Kjeldahl), total and particulate phos- phorous, and total organic carbon (TOC). No trend was observed on the impact of nutrients on SST communities. This study profiled new health-related, environmental, and technical challenges for the production of drinking water due to the complex fate of cyanobacteria in cyanobacteria-laden sludge and supernatant
Computational Approaches To Improving The Reconstruction Of Metabolic Pathway
Metabolic pathway reconstruction is the essence of systems biology where in silico modeling
and prediction of the cell's function is based on the interaction of the cell's components
represented as a network of reactions. The reconstructed model and the associated database
of information about the organism's genes and their functional roles facilitate a variety of
analysis and simulation techniques that can enrich our understanding. However, there are
unresolved issues for genome-scale metabolic network reconstruction, such as our incomplete
knowledge of the cell's networks for metabolism, transport, and regulation; the completeness,
accuracy, and specificity of the annotation of genomes; and our ability to fully utilise the
available information from -omics (genomics, proteomics, metabolomics, etc) for the reconstruction
of the networks. These issues result in incomplete metabolic models, which limit
our ability to perform analysis of and to make predictions about the cell that are based on
the network model.
This dissertation discusses the state-of-the-art of metabolic pathway reconstruction and highlights
the outstanding issues. In particular, we consider a number of case studies using
genomes of fungi relevant to industrial applications, such as biofuels, to demonstrate the
performance of existing techniques and illustrate the issues. Our case studies focus on the
cell's central metabolism, and the utilisation and transport of sugars as a carbon source,
since these are essential concerns for industrial applications.
A significant deficiency in the existing state-of-the-art for the reconstruction of metabolic
pathways is the ability to associate genes and proteins to the transport reactions that move
specific compounds across the membranes of the cell. The dissertation reviews the state-of-the-
art of prediction methods for transmembrane transport proteins by developing a scheme
to describe and compare existing methods, and applying the existing techniques to the
v
fungal genome of A. niger CBS 513.88. This reveals the split between those methods that
use the Transporter Classification (TC) as their target for prediction, and those that use
the type of chemical substrates being transported as their target. Despite this difficulty in
comparing approaches, it is clear that the state-of-the-art cannot predict specific substrates
being transported, and hence cannot associate genes and proteins to the transport reactions.
The dissertation presents TransATH, which stands for Transporters via ATH (Annotation
Transfer by Homology), a system which automates Saier's protocol and includes the computation
of subcellular localization and improves the computation of transmembrane segments.
The choice of thresholds for the parameters of TransATH is investigated to determine optimal
performance as defined by a gold standard set of transporters and non-transporters from
S. cerevisiae. The dissertation demonstrates TransATH on the fungal genome of A. niger
CBS 513.88 and evaluates the correctness of TransATH using the curated information in
AspGD (the Aspergillus Database). A website for TransATH is available for use