15,220 research outputs found
Causality, Information and Biological Computation: An algorithmic software approach to life, disease and the immune system
Biology has taken strong steps towards becoming a computer science aiming at
reprogramming nature after the realisation that nature herself has reprogrammed
organisms by harnessing the power of natural selection and the digital
prescriptive nature of replicating DNA. Here we further unpack ideas related to
computability, algorithmic information theory and software engineering, in the
context of the extent to which biology can be (re)programmed, and with how we
may go about doing so in a more systematic way with all the tools and concepts
offered by theoretical computer science in a translation exercise from
computing to molecular biology and back. These concepts provide a means to a
hierarchical organization thereby blurring previously clear-cut lines between
concepts like matter and life, or between tumour types that are otherwise taken
as different and may not have however a different cause. This does not diminish
the properties of life or make its components and functions less interesting.
On the contrary, this approach makes for a more encompassing and integrated
view of nature, one that subsumes observer and observed within the same system,
and can generate new perspectives and tools with which to view complex diseases
like cancer, approaching them afresh from a software-engineering viewpoint that
casts evolution in the role of programmer, cells as computing machines, DNA and
genes as instructions and computer programs, viruses as hacking devices, the
immune system as a software debugging tool, and diseases as an
information-theoretic battlefield where all these forces deploy. We show how
information theory and algorithmic programming may explain fundamental
mechanisms of life and death.Comment: 30 pages, 8 figures. Invited chapter contribution to Information and
Causality: From Matter to Life. Sara I. Walker, Paul C.W. Davies and George
Ellis (eds.), Cambridge University Pres
Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
Recent advances of information technology in biomedical sciences and other
applied areas have created numerous large diverse data sets with a high
dimensional feature space, which provide us a tremendous amount of information
and new opportunities for improving the quality of human life. Meanwhile, great
challenges are also created driven by the continuous arrival of new data that
requires researchers to convert these raw data into scientific knowledge in
order to benefit from it. Association studies of complex diseases using SNP
data have become more and more popular in biomedical research in recent years.
In this paper, we present a review of recent statistical advances and
challenges for analyzing correlated high dimensional SNP data in genomic
association studies for complex diseases. The review includes both general
feature reduction approaches for high dimensional correlated data and more
specific approaches for SNPs data, which include unsupervised haplotype
mapping, tag SNP selection, and supervised SNPs selection using statistical
testing/scoring, statistical modeling and machine learning methods with an
emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics
Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic Medicine and Biomedical Research
Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment outcomes. On the other side, the missing or hidden heritability has suggested that the host quality of life and other environmental factors may also influence differences in disease risk and drug/treatment responses in genomic medicine, and orient biomedical research, even though this may be highly constrained by genetic capabilities. It is expected that combining these different factors can yield a paradigm-shift of personalized medicine and lead to a more effective medical treatment. With existing “big data” initiatives and high-performance computing infrastructures, there is a need for data-driven learning algorithms and models that enable the selection and prioritization of relevant genetic variants (post-genomic medicine) and trigger effective translation into clinical practice. In this chapter, we survey and discuss existing machine learning algorithms and post-genomic analysis models supporting the process of identifying valuable markers
How to understand the cell by breaking it: network analysis of gene perturbation screens
Modern high-throughput gene perturbation screens are key technologies at the
forefront of genetic research. Combined with rich phenotypic descriptors they
enable researchers to observe detailed cellular reactions to experimental
perturbations on a genome-wide scale. This review surveys the current
state-of-the-art in analyzing perturbation screens from a network point of
view. We describe approaches to make the step from the parts list to the wiring
diagram by using phenotypes for network inference and integrating them with
complementary data sources. The first part of the review describes methods to
analyze one- or low-dimensional phenotypes like viability or reporter activity;
the second part concentrates on high-dimensional phenotypes showing global
changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality
Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan
Multi-scale modeling of gene-behavior associations in an artificial neural network model of cognitive development
In the multi-disciplinary field of developmental cognitive neuroscience, statistical associations between levels of description play an increasingly important role. One example of such associations is the observation of correlations between relatively common gene variants and individual differences in behavior. It is perhaps surprising that such associations can be detected despite the remoteness of these levels of description, and the fact that behavior is the outcome of an extended developmental process involving interaction with a variable environment. Given that they have been detected, how do such associations inform cognitive-level theories? To investigate this question, we employed a multi-scale computational model of development, using a sample domain drawn from the field of language acquisition. The model comprised an artificial neural network model of past-tense acquisition trained using the backpropagation learning algorithm, extended to incorporate population modeling and genetic algorithms. It included five levels of description, four internal: genetic, network, neurocomputation, behavior; and one external: environment. Since the mechanistic assumptions of the model were known and its operation was relatively transparent, we could evaluate whether cross-level associations gave an accurate picture of causal processes. We established that associations could be detected between artificial genes and behavioral variation, even under polygenic assumptions of a many-to-one relationship between genes and neurocomputational parameters, and when an experience-dependent developmental process interceded between the action of genes and the emergence of behavior. We evaluated these associations with respect to their specificity (to different behaviors, to function versus structure), to their developmental stability, and to their replicability, as well as considering issues of missing heritability and gene-environment interactions. We argue that gene-behavior associations can inform cognitive theory with respect to effect size, specificity, and timing. The model demonstrates a means by which researchers can undertake modeling multi-scale modeling with respect to cognition, and develop highly specific and complex hypotheses across multiple levels of description
Mathematics at the eve of a historic transition in biology
A century ago physicists and mathematicians worked in tandem and established
quantum mechanism. Indeed, algebras, partial differential equations, group
theory, and functional analysis underpin the foundation of quantum mechanism.
Currently, biology is undergoing a historic transition from qualitative,
phenomenological and descriptive to quantitative, analytical and predictive.
Mathematics, again, becomes a driving force behind this new transition in
biology.Comment: 5 pages, 2 figure
- …