Search CORE

5 research outputs found

Shaping and Dilating the Fitness Landscape for Parameter Estimation in Stochastic Biochemical Models

Author: Cazzaniga P.
Manzoni L.
Nobile M. S.
Papetti D. M.
Spolaor S.
Publication venue
Publication date: 01/01/2022
Field of study

The parameter estimation (PE) of biochemical reactions is one of the most challenging tasks in systems biology given the pivotal role of these kinetic constants in driving the behavior of biochemical systems. PE is a non-convex, multi-modal, and non-separable optimization problem with an unknown fitness landscape; moreover, the quantities of the biochemical species appearing in the system can be low, making biological noise a non-negligible phenomenon and mandating the use of stochastic simulation. Finally, the values of the kinetic parameters typically follow a log-uniform distribution; thus, the optimal solutions are situated in the lowest orders of magnitude of the search space. In this work, we further elaborate on a novel approach to address the PE problem based on a combination of adaptive swarm intelligence and dilation functions (DFs). DFs require prior knowledge of the characteristics of the fitness landscape; therefore, we leverage an alternative solution to evolve optimal DFs. On top of this approach, we introduce surrogate Fourier modeling to simplify the PE, by producing a smoother version of the fitness landscape that excludes the high frequency components of the fitness function. Our results show that the PE exploiting evolved DFs has a performance comparable with that of the PE run with a custom DF. Moreover, surrogate Fourier modeling allows for improving the convergence speed. Finally, we discuss some open problems related to the scalability of our methodology

Archivio istituzionale della ricerca - Università di Trieste

Pure OAI Repository

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Reliable Generation of Native-Like Decoys Limits Predictive Ability in Fragment-Based Protein Structure Prediction

Author: Garza-Fabre M
Handl J
Kandathil SM
Lovell SC
Publication venue
Publication date: 15/10/2019
Field of study

Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions

UCL Discovery

A Framework for Semantic Similarity Measures to enhance Knowledge Graph Quality

Author: Traverso Ribón Ignacio
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Precisely determining similarity values among real-world entities becomes a building block for data driven tasks, e.g., ranking, relation discovery or integration. Semantic Web and Linked Data initiatives have promoted the publication of large semi-structured datasets in form of knowledge graphs. Knowledge graphs encode semantics that describes resources in terms of several aspects or resource characteristics, e.g., neighbors, class hierarchies or attributes. Existing similarity measures take into account these aspects in isolation, which may prevent them from delivering accurate similarity values. In this thesis, the relevant resource characteristics to determine accurately similarity values are identified and considered in a cumulative way in a framework of four similarity measures. Additionally, the impact of considering these resource characteristics during the computation of similarity values is analyzed in three data-driven tasks for the enhancement of knowledge graph quality. First, according to the identified resource characteristics, new similarity measures able to combine two or more of them are described. In total four similarity measures are presented in an evolutionary order. While the first three similarity measures, OnSim, IC-OnSim and GADES, combine the resource characteristics according to a human defined aggregation function, the last one, GARUM, makes use of a machine learning regression approach to determine the relevance of each resource characteristic during the computation of the similarity. Second, the suitability of each measure for real-time applications is studied by means of a theoretical and an empirical comparison. The theoretical comparison consists on a study of the worst case computational complexity of each similarity measure. The empirical comparison is based on the execution times of the different similarity measures in two third-party benchmarks involving the comparison of semantically annotated entities. Ultimately, the impact of the described similarity measures is shown in three data-driven tasks for the enhancement of knowledge graph quality: relation discovery, dataset integration and evolution analysis of annotation datasets. Empirical results show that relation discovery and dataset integration tasks obtain better results when considering semantics encoded in semantic similarity measures. Further, using semantic similarity measures in the evolution analysis tasks allows for defining new informative metrics able to give an overview of the evolution of the whole annotation set, instead of the individual annotations like state-of-the-art evolution analysis frameworks

KITopen

Can cyanobacterial diversity in the source predict the diversity in sludge and the risk of toxin release in a drinking water treatment plant?

Author: Dorner Sarah
Fortin Nathalie
Guerra Maldonado Juan Francisco
Jalili Farhad
Prévost Michèle
Sauve Sebastien
Shapiro B. Jesse
Terrat Yves
Trigui Hana
Zamyadi Arash
Publication venue: MDPI
Publication date: 01/01/2021
Field of study

ABSTRACT: Conventional processes (coagulation, flocculation, sedimentation, and filtration) are widely used in drinking water treatment plants and are considered a good treatment strategy to eliminate cyanobacterial cells and cell-bound cyanotoxins. The diversity of cyanobacteria was investigated using taxonomic cell counts and shotgun metagenomics over two seasons in a drinking water treat- ment plant before, during, and after the bloom. Changes in the community structure over time at the phylum, genus, and species levels were monitored in samples retrieved from raw water (RW), sludge in the holding tank (ST), and sludge supernatant (SST). Aphanothece clathrata brevis, Microcystis aeruginosa, Dolichospermum spiroides, and Chroococcus minimus were predominant species detected in RW by taxonomic cell counts. Shotgun metagenomics revealed that Proteobacteria was the pre- dominant phylum in RW before and after the cyanobacterial bloom. Taxonomic cell counts and shotgun metagenomic showed that the Dolichospermum bloom occurred inside the plant. Cyanobac- teria and Bacteroidetes were the major bacterial phyla during the bloom. Shotgun metagenomics also showed that Synechococcus, Microcystis, and Dolichospermum were the predominant detected cyanobacterial genera in the samples. Conventional treatment removed more than 92% of cyanobac- terial cells but led to cell accumulation in the sludge up to 31 times more than in the RW influx. Coagulation/sedimentation selectively removed more than 96% of Microcystis and Dolichospermum. Cyanobacterial community in the sludge varied from raw water to sludge during sludge storage (1–13 days). This variation was due to the selective removal of coagulation/sedimentation as well as the accumulation of captured cells over the period of storage time. However, the prediction of the cyanobacterial community composition in the SST remained a challenge. Among nutrient parameters, orthophosphate availability was related to community profile in RW samples, whereas communities in ST were influenced by total nitrogen, Kjeldahl nitrogen (N- Kjeldahl), total and particulate phos- phorous, and total organic carbon (TOC). No trend was observed on the impact of nutrients on SST communities. This study profiled new health-related, environmental, and technical challenges for the production of drinking water due to the complex fate of cyanobacteria in cyanobacteria-laden sludge and supernatant

PolyPublie

University of Melbourne Institutional Repository

Computational Approaches To Improving The Reconstruction Of Metabolic Pathway

Author: Aplop Faizah
Publication venue
Publication date: 24/05/2016
Field of study

Metabolic pathway reconstruction is the essence of systems biology where in silico modeling and prediction of the cell's function is based on the interaction of the cell's components represented as a network of reactions. The reconstructed model and the associated database of information about the organism's genes and their functional roles facilitate a variety of analysis and simulation techniques that can enrich our understanding. However, there are unresolved issues for genome-scale metabolic network reconstruction, such as our incomplete knowledge of the cell's networks for metabolism, transport, and regulation; the completeness, accuracy, and specificity of the annotation of genomes; and our ability to fully utilise the available information from -omics (genomics, proteomics, metabolomics, etc) for the reconstruction of the networks. These issues result in incomplete metabolic models, which limit our ability to perform analysis of and to make predictions about the cell that are based on the network model. This dissertation discusses the state-of-the-art of metabolic pathway reconstruction and highlights the outstanding issues. In particular, we consider a number of case studies using genomes of fungi relevant to industrial applications, such as biofuels, to demonstrate the performance of existing techniques and illustrate the issues. Our case studies focus on the cell's central metabolism, and the utilisation and transport of sugars as a carbon source, since these are essential concerns for industrial applications. A significant deficiency in the existing state-of-the-art for the reconstruction of metabolic pathways is the ability to associate genes and proteins to the transport reactions that move specific compounds across the membranes of the cell. The dissertation reviews the state-of-the- art of prediction methods for transmembrane transport proteins by developing a scheme to describe and compare existing methods, and applying the existing techniques to the v fungal genome of A. niger CBS 513.88. This reveals the split between those methods that use the Transporter Classification (TC) as their target for prediction, and those that use the type of chemical substrates being transported as their target. Despite this difficulty in comparing approaches, it is clear that the state-of-the-art cannot predict specific substrates being transported, and hence cannot associate genes and proteins to the transport reactions. The dissertation presents TransATH, which stands for Transporters via ATH (Annotation Transfer by Homology), a system which automates Saier's protocol and includes the computation of subcellular localization and improves the computation of transmembrane segments. The choice of thresholds for the parameters of TransATH is investigated to determine optimal performance as defined by a gold standard set of transporters and non-transporters from S. cerevisiae. The dissertation demonstrates TransATH on the fungal genome of A. niger CBS 513.88 and evaluates the correctness of TransATH using the curated information in AspGD (the Aspergillus Database). A website for TransATH is available for use

Concordia University Research Repository