67,502 research outputs found

    Genetic Improvement of computational biology software

    Get PDF
    There is a cultural divide between computer scientists and biologists that needs to be addressed. The two disciplines used to be quite unrelated but many new research areas have arisen from their synergy. We selectively review two multi-disciplinary problems: dealing with contamination in sequencing data repositories and improving software using biology inspired evolutionary computing. Through several examples, we show that ideas from biology may result in optimised code and provide surprising improvements that overcome challenges in speed and quality trade-offs. On the other hand, development of computational methods is essential for maintaining contamination free databases. Computer scientists and biologists must always be sceptical of each others data, just as they would be of their own

    Systems biology in animal sciences

    Get PDF
    Systems biology is a rapidly expanding field of research and is applied in a number of biological disciplines. In animal sciences, omics approaches are increasingly used, yielding vast amounts of data, but systems biology approaches to extract understanding from these data of biological processes and animal traits are not yet frequently used. This paper aims to explain what systems biology is and which areas of animal sciences could benefit from systems biology approaches. Systems biology aims to understand whole biological systems working as a unit, rather than investigating their individual components. Therefore, systems biology can be considered a holistic approach, as opposed to reductionism. The recently developed ‘omics’ technologies enable biological sciences to characterize the molecular components of life with ever increasing speed, yielding vast amounts of data. However, biological functions do not follow from the simple addition of the properties of system components, but rather arise from the dynamic interactions of these components. Systems biology combines statistics, bioinformatics and mathematical modeling to integrate and analyze large amounts of data in order to extract a better understanding of the biology from these huge data sets and to predict the behavior of biological systems. A ‘system’ approach and mathematical modeling in biological sciences are not new in itself, as they were used in biochemistry, physiology and genetics long before the name systems biology was coined. However, the present combination of mass biological data and of computational and modeling tools is unprecedented and truly represents a major paradigm shift in biology. Significant advances have been made using systems biology approaches, especially in the field of bacterial and eukaryotic cells and in human medicine. Similarly, progress is being made with ‘system approaches’ in animal sciences, providing exciting opportunities to predict and modulate animal traits

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Assessment of the potential impacts of plant traits across environments by combining global sensitivity analysis and dynamic modeling in wheat

    Full text link
    A crop can be viewed as a complex system with outputs (e.g. yield) that are affected by inputs of genetic, physiology, pedo-climatic and management information. Application of numerical methods for model exploration assist in evaluating the major most influential inputs, providing the simulation model is a credible description of the biological system. A sensitivity analysis was used to assess the simulated impact on yield of a suite of traits involved in major processes of crop growth and development, and to evaluate how the simulated value of such traits varies across environments and in relation to other traits (which can be interpreted as a virtual change in genetic background). The study focused on wheat in Australia, with an emphasis on adaptation to low rainfall conditions. A large set of traits (90) was evaluated in a wide target population of environments (4 sites x 125 years), management practices (3 sowing dates x 2 N fertilization) and CO2CO_2 (2 levels). The Morris sensitivity analysis method was used to sample the parameter space and reduce computational requirements, while maintaining a realistic representation of the targeted trait x environment x management landscape (∼\sim 82 million individual simulations in total). The patterns of parameter x environment x management interactions were investigated for the most influential parameters, considering a potential genetic range of +/- 20% compared to a reference. Main (i.e. linear) and interaction (i.e. non-linear and interaction) sensitivity indices calculated for most of APSIM-Wheat parameters allowed the identifcation of 42 parameters substantially impacting yield in most target environments. Among these, a subset of parameters related to phenology, resource acquisition, resource use efficiency and biomass allocation were identified as potential candidates for crop (and model) improvement.Comment: 22 pages, 8 figures. This work has been submitted to PLoS On

    Spaced seeds improve k-mer-based metagenomic classification

    Full text link
    Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.Comment: 23 page
    • …
    corecore