119,629 research outputs found
EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data
Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes
Recommended from our members
The application of software visualization technology to evolutionary computation: a case study in Genetic Algorithms
Evolutionary computation is an area within the field of artificial intelligence that is founded upon the principles of biological evolution. Evolution can be defined as the process of gradual development. Evolutionary algorithms are typically applied as a generic problem solving method, searching a problem space in order to locate good solutions. These solutions are found through an iterative evolutionary search that progresses by means of gradual developments.
In the majority of cases of evolutionary computation the user is not aware of their algorithm's search behaviour. This causes two problems. First, the user has no way of assuring the quality of any solutions found other than to compare the solutions found by the algorithm with any available benchmark solutions or to re-run the algorithm and check if the results can be repeated or improved upon. Second, because the user is unaware of the algorithm's behaviour they have no way of identifying the contribution of the different components of the algorithm and therefore, no direct way of analyzing the algorithm's design and assigning credit to good algorithm components, or locating and improving ineffective algorithm components.
The artificial intelligence and engineering communities have been slow to accept evolutionary computation as a robust problem-solving method because, unlike cased-based systems, rule-based systems or belief networks, they are unable to follow the algorithm's reasoning when locating a set of solutions in the problem space. During an evolutionary algorithm's execution the user may be able to see the results of the search but the search process itself like is a "black box" to the user. It is the search behaviour of evolutionary algorithms that needs to be understood by the user, in order for evolutionary computation to become more accepted within these communities.
The aim of software visualization is to help people understand and use computer software. Software visualization technology has been applied successfully to illustrate a variety of heuristic search algorithms, programming languages and data structures. This thesis adopts software visualization as an approach for illustrating the search behaviour of evolutionary algorithms.
Genetic Algorithms ("GAs") are used here as a specific case study to illustrate how software visualization may be applied to evolutionary computation. A set of visualization requirements are derived from the findings of a GA user study. A number of search space visualization techniques are examined for illustrating the search behaviour of a GA. "Henson," an extendable framework for developing visualization tools for genetic algorithms is presented. Finally, the application of the Henson framework is illustrated by the development of "Gonzo," a visualization tool designed to enable GA users to explore their algorithm's search behaviour.
The contributions made in this thesis extend into the areas of software visualization, evolutionary computation and the psychology of programming. The GA user study presented here is the first and only known study of the working practices of GA users. The search space visualization techniques proposed here have never been applied in this domain before, and the resulting interactive visualizations provide the GA user with a previously unavailable insight into their algorithm's operation
Prospects for computational steering of evolutionary computation
Currently, evolutionary computation (EC) typically takes place in batch mode: algorithms are run autonomously, with the user providing little or no intervention or guidance. Although it is rarely possible to specify in advance, on the basis of EC theory, the optimal evolutionary algorithm for a particular problem, it seems likely that experienced EC practitioners possess considerable tacit knowledge of how evolutionary algorithms work. In situations such as this, computational steering (ongoing, informed user intervention in the execution of an otherwise autonomous computational process) has been profitably exploited to improve performance and generate insights into computational processes. In this short paper, prospects for the computational steering of evolutionary computation are assessed, and a prototype example of computational steering applied to a coevolutionary algorithm is presented
Uncertainty in phylogenetic tree estimates
Estimating phylogenetic trees is an important problem in evolutionary
biology, environmental policy and medicine. Although trees are estimated, their
uncertainties are discarded by mathematicians working in tree space. Here we
explicitly model the multivariate uncertainty of tree estimates. We consider
both the cases where uncertainty information arises extrinsically (through
covariate information) and intrinsically (through the tree estimates
themselves). The importance of accounting for tree uncertainty in tree space is
demonstrated in two case studies. In the first instance, differences between
gene trees are small relative to their uncertainties, while in the second, the
differences are relatively large. Our main goal is visualization of tree
uncertainty, and we demonstrate advantages of our method with respect to
reproducibility, speed and preservation of topological differences compared to
visualization based on multidimensional scaling. The proposal highlights that
phylogenetic trees are estimated in an extremely high-dimensional space,
resulting in uncertainty information that cannot be discarded. Most
importantly, it is a method that allows biologists to diagnose whether
differences between gene trees are biologically meaningful, or due to
uncertainty in estimation.Comment: Final version accepted to Journal of Computational and Graphical
Statistic
From Social Simulation to Integrative System Design
As the recent financial crisis showed, today there is a strong need to gain
"ecological perspective" of all relevant interactions in
socio-economic-techno-environmental systems. For this, we suggested to set-up a
network of Centers for integrative systems design, which shall be able to run
all potentially relevant scenarios, identify causality chains, explore feedback
and cascading effects for a number of model variants, and determine the
reliability of their implications (given the validity of the underlying
models). They will be able to detect possible negative side effect of policy
decisions, before they occur. The Centers belonging to this network of
Integrative Systems Design Centers would be focused on a particular field, but
they would be part of an attempt to eventually cover all relevant areas of
society and economy and integrate them within a "Living Earth Simulator". The
results of all research activities of such Centers would be turned into
informative input for political Decision Arenas. For example, Crisis
Observatories (for financial instabilities, shortages of resources,
environmental change, conflict, spreading of diseases, etc.) would be connected
with such Decision Arenas for the purpose of visualization, in order to make
complex interdependencies understandable to scientists, decision-makers, and
the general public.Comment: 34 pages, Visioneer White Paper, see http://www.visioneer.ethz.c
MultiSeq: unifying sequence and structure data for evolutionary analysis
BACKGROUND: Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. RESULTS: Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. CONCLUSION: MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for analyzing molecular dynamics simulations. Both are freely distributed by the NIH Resource for Macromolecular Modeling and Bioinformatics and MultiSeq is included with VMD starting with version 1.8.5. The MultiSeq website has details on how to download and use the software
Incorporating Road Networks into Territory Design
Given a set of basic areas, the territory design problem asks to create a
predefined number of territories, each containing at least one basic area, such
that an objective function is optimized. Desired properties of territories
often include a reasonable balance, compact form, contiguity and small average
journey times which are usually encoded in the objective function or formulated
as constraints. We address the territory design problem by developing graph
theoretic models that also consider the underlying road network. The derived
graph models enable us to tackle the territory design problem by modifying
graph partitioning algorithms and mixed integer programming formulations so
that the objective of the planning problem is taken into account. We test and
compare the algorithms on several real world instances
- …