873 research outputs found

    Improvement of KiMoSys framework for kinetic modelling

    Get PDF
    Over the past years, an increasing amount of biological data produced shows the impor tance of data repositories. The databases ensure an easier way to reuse and share research data between the scientific community. Among the most important features are the quick access to data, described by metadata and available in standard formats, and the compli ance with the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles for data management. KiMoSys (https://kimosys.org) is a public domain-specific repository of experi mental data, containing concentration data of enzymes, metabolites and flux data. It offers a web-based interface and upload facility to publish data, making it accessible in standard formats, while also integrating kinetic models related to the data. This thesis is a contribution to the improvement and extension of KiMoSys. It includes the addition of more downloadable data formats, the introduction of data visualization, the incorporation of more tools to filter data, the integration of a simulation environment for kinetic models and the inclusion of a unique persistent identifier system. As a result, it is provided a new version of KiMoSys, with a renewed interface, mul tiple new features, and an enhancement of the previously existing ones. These are in accordance with all FAIR data principles. Therefore, it is believed that KiMoSys v2.0 will be an important tool for the systems biology modeling community.Nos Ășltimos anos, uma quantidade crescente de dados biolĂłgicos produzidos atesta a importĂąncia dos repositĂłrios de dados. As bases de dados garantem uma maneira mais fĂĄcil de reutilizar e partilhar dados de pesquisa entre a comunidade cientĂ­fica. Entre as caracterĂ­sticas mais importantes estĂŁo o rĂĄpido acesso aos dados, descritos por metada dos e disponĂ­veis em formatos padrĂŁo, e o cumprimento dos PrincĂ­pios FAIR (Findable, Accessible, Interoperable e Reusable) para a gestĂŁo de dados. KiMoSys (https://kimosys.org) consiste num repositĂłrio pĂșblico de domĂ­nio espe cĂ­fico de dados experimentais, contendo dados de concentração de enzimas, metabolitos e dados de fluxo. Oferece uma interface para a web e uma ferramenta de carregamento de dados, tornando-os acessĂ­veis em formatos padrĂŁo, alĂ©m de integrar modelos cinĂ©ticos relacionados aos dados. Esta tese contribui para o melhoramento e extensĂŁo do KiMoSys. Inclui a adição de mais formatos de dados para descarga, a introdução de visualização de dados, a incorpo ração de mais opçÔes para filtrar os dados, a integração de um ambiente de simulação para modelos cinĂ©ticos e a inclusĂŁo de um sistema de identificador Ășnico persistente. Como resultado, Ă© apresentada uma nova versĂŁo do KiMoSys, com uma interface renovada, vĂĄrias novas caracterĂ­sticas e um aprimoramento das anteriormente existentes. Estas estĂŁo de acordo com todos os princĂ­pios de dados FAIR. Portanto, acredita-se que o KiMoSys v2.0 serĂĄ uma ferramenta importante para a comunidade de modelagem de sistemas biolĂłgicos

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools

    Get PDF
    Background: Qualitative frameworks, especially those based on the logical discrete formalism, are increasingly used to model regulatory and signalling networks. A major advantage of these frameworks is that they do not require precise quantitative data, and that they are well-suited for studies of large networks. While numerous groups have developed specific computational tools that provide original methods to analyse qualitative models, a standard format to exchange qualitative models has been missing. Results: We present the Systems Biology Markup Language (SBML) Qualitative Models Package (“qual”), an extension of the SBML Level 3 standard designed for computer representation of qualitative models of biological networks. We demonstrate the interoperability of models via SBML qual through the analysis of a specific signalling network by three independent software tools. Furthermore, the collective effort to define the SBML qual format paved the way for the development of LogicalModel, an open-source model library, which will facilitate the adoption of the format as well as the collaborative development of algorithms to analyse qualitative models. Conclusions: SBML qual allows the exchange of qualitative models among a number of complementary software tools. SBML qual has the potential to promote collaborative work on the development of novel computational approaches, as well as on the specification and the analysis of comprehensive qualitative models of regulatory and signalling networks

    Polyaseeker: a computational framework for identifying polyadenylation cleavage site from RNA-seq

    Get PDF
    Alternative polyadenylation (APA) of mRNA plays a crucial role for post-transcriptional gene regulation. Recently, advances in next generation sequencing technology have made it possible to efficiently characterize the transcriptome and identify the 3’end of polyadenylated RNAs. However, no comprehensive bioi nformatic pipelines have fulfilled this goal. The PolyASeeker, a computational framework for identifying polyadenylation cleavage sites from RNA-Seq data is proposed in this thesis. By using the simulated RNA-seq dataset, a novel method is developed to evaluate the performance of the proposed framework versus the traditional A-stretch approach, and compute accurate Precisions and Recalls that previous estimation could not get. It is found that the proposed method is able to achieve significantly higher sensitivity in various scenarios than the A-stretch approach. In further studies, PolyASeeker is applied to human tissue- specific RNA-sequencing data, and through all the polyA sites identified by PolyASeeker and annotated by PolyA DB, special isoform expression patterns among tissues are found. Genes that have a specific 3’UTR expression have also been recognized in the brain. PolyASeeker is also run on an mRNA 3’ UTR sequencing dataset and it is found that the software could be quite adapted to the data. Significant isoform shorting events with expression evidences and experimental supports have been found

    Metamotifs--a generative model for building families of nucleotide position weight matrices.

    Get PDF
    BACKGROUND: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. RESULTS: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. CONCLUSIONS: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Molecular dynamics simulations through GPU video games technologies

    Get PDF
    Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements of atoms and molecules is molecular dynamics (MD). MD is an in silico method to simulate the physical motions of atoms and molecules under certain conditions. This has become a state strategic technique and now plays a key role in many areas of exact sciences, such as chemistry, biology, physics and medicine. Due to their complexity, MD calculations could require enormous amounts of computer memory and time and therefore their execution has been a big problem. Despite the huge computational cost, molecular dynamics have been implemented using traditional computers with a central memory unit (CPU). A graphics processing unit (GPU) computing technology was first designed with the goal to improve video games, by rapidly creating and displaying images in a frame buffer such as screens. The hybrid GPU-CPU implementation, combined with parallel computing is a novel technology to perform a wide range of calculations. GPUs have been proposed and used to accelerate many scientific computations including MD simulations. Herein, we describe the new methodologies developed initially as video games and how they are now applied in MD simulations

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Finding a suitable performance testing tool

    Get PDF
    Abstract. The pursuit of finding the most suitable testing software for each project is a difficult task as there are a lot of software effective finding certain kind of problems but completely missing others in the field of stress and load testing. A silver bullet solving all problems in a cost effective and reliable way has not yet been found. This project was done as a systematic literature review to find whether there are solutions documented capable of testing everything in a cost-effective way. The document starts with an introduction of the task, originating from a real software testing company’s suggestion of finding suitable test software that can, cost effectively and reliably, fulfil the needs of the company. A history section is describing the reason of testing importance, basics of testing and what others have found in their studies of the area. The research method is described in detail followed by results describing tools found during the research divided in sections by license type. The sectioning by license type was selected for the benefit of testing companies that are interested in further developing tools found to their own interest. Findings and answered research questions were presented and discussed followed by possible implications and further research suggestions to future scholars interested in the matter. The systematic literature review found a total of 40 different tools identified during the data extraction process. One complete software system was available commercially including heavy support and help functions for the customer. A different approach linking open source and relatively inexpensive pieces of software together to achieve a composite solution was also identified. The solution included the most common and most popular individual piece of software identified by the study. All found pieces of software were listed and commented briefly mainly with information originating from the authors’ home pages

    Book of Abstracts - XIII EUCARPIA Biometrics in Plant Breeding Section Meeting - 30 August - 1 September 2006 - Zagreb, Croatia

    Get PDF
    The Book of Abstracts of the XIII EUCARPIA Biometrics in Plant Breeding Section Meeting held in 2006 in Zagreb, Croatia, contains the abstracts of 40 oral presentations and 22 posters as presented during six sessions: Linkage and LD based QTL Mapping Methodology I and II, Computer Science, Bioinformatics and Analysis of Large Data Sets, Crop Growth Modelling / Modelling GxE, and Collaborative Breeding. All the abstracts have been thoroughly reviewed by the members of Scientific Committee
    • 

    corecore