152 research outputs found
Elastic Network Models in Biology: From Protein Mode Spectra to Chromatin Dynamics
Biomacromolecules perform their functions by accessing conformations energetically favored by their structure-encoded equilibrium dynamics. Elastic network model (ENM) analysis has been widely used to decompose the equilibrium dynamics of a given molecule into a spectrum of modes of motions, which separates robust, global motions from local fluctuations. The scalability and flexibility of the ENMs permit us to efficiently analyze the spectral dynamics of large systems or perform comparative analysis for large datasets of structures. I showed in this thesis how ENMs can be adapted (1) to analyze protein superfamilies that share similar tertiary structures but may differ in their sequence and functional dynamics, and (2) to analyze chromatin dynamics using contact data from Hi-C experiments, and (3) to perform a comparative analysis of genome topology across different types of cell lines. The first study showed that protein family members share conserved, highly cooperative (global) modes of motion. A low-to-intermediate frequency spectral regime was shown to have a maximal impact on the functional differentiation of families into subfamilies. The second study demonstrated the Gaussian Network Model (GNM) can accurately model chromosomal mobility and couplings between genomic loci at multiple scales: it can quantify the spatial fluctuations in the positions of gene loci, detect large genomic compartments and smaller topologically-associating domains (TADs) that undergo en bloc movements, and identify dynamically coupled distal regions along the chromosomes. The third study revealed close similarities between chromosomal dynamics across different cell lines on a global scale, but notable cell-specific variations in the spatial fluctuations of genomic loci. It also called attention to the role of the intrinsic spatial dynamics of chromatin as a determinant of cell differentiation. Together, these studies provide a comprehensive view of the versatility and utility of the ENMs in analyzing spatial dynamics of biomolecules, from individual proteins to the entire chromatin
3D Organization of Eukaryotic and Prokaryotic Genomes
There is a complex mutual interplay between three-dimensional (3D) genome organization and cellular activities in bacteria and eukaryotes. The aim of this thesis is to investigate such structure-function relationships.
A main part of this thesis deals with the study of the three-dimensional genome organization using novel techniques for detecting genome-wide contacts using next-generation sequencing. These so called chromatin conformation capture-based methods, such as 5C and Hi-C, give deep insights into the architecture of the genome inside the nucleus, even on a small scale. We shed light on the question how the vastly increasing Hi-C data can generate new insights about the way the genome is organized in 3D.
To this end, we first present the typical Hi-C data processing workflow to obtain Hi-C contact maps and show potential pitfalls in the interpretation of such contact maps using our own data pipeline and publicly available Hi-C data sets. Subsequently, we focus on approaches to modeling 3D genome organization based on contact maps. In this context, a computational tool was developed which interactively visualizes contact maps alongside complementary genomic data tracks. Inspired by machine learning with the help of probabilistic graphical models, we developed a tool that detects the compartmentalization structure within contact maps on multiple scales. In a further project, we propose and test one possible mechanism for the observed compartmentalization within contact maps of genomes across multiple species: Dynamic formation of loops within domains.
In the context of 3D organization of bacterial chromosomes, we present the first direct evidence for global restructuring by long-range interactions of a DNA binding protein. Using Hi-C and live cell imaging of DNA loci, we show that the DNA binding protein Rok forms insulator-like complexes looping the B. subtilis genome over large distances. This biological mechanism agrees with our model based on dynamic formation of loops affecting domain formation in eukaryotic genomes. We further investigate the spatial segregation of the E. coli chromosome during cell division. In particular, we are interested in the positioning of the chromosomal replication origin region based on its interaction with the protein complex MukBEF. We tackle the problem using a combined approach of stochastic and polymer simulations.
Last but not least, we develop a completely new methodology to analyze single molecule localization microscopy images based on topological data analysis. By using this new approach in the analysis of irradiated cells, we are able to show that the topology of repair foci can be categorized depending the distance to heterochromatin
Rapid prototyping of distributed systems of electronic control units in vehicles
Existing vehicle electronics design is largely divided by feature, with integration taking place at a late stage. This leads to a number of drawbacks, including longer development time and increased cost, both of which this research overcomes by considering the system as a whole and, in particular, generating an executable model to permit testing. To generate such a model, a number of inputs needed to be made available. These include a structural description of the vehicle electronics, functional descriptions of both the electronic control units and the communications buses, the application code that implements the feature and software patterns to implement the low-level interfaces to sensors and actuators. [Continues.
Recommended from our members
Machine learning methods for detecting structure in metabolic flow networks
Metabolic flow networks are large scale, mechanistic biological models with good predictive power.
However, even when they provide good predictions, interpreting the meaning of their structure can be very difficult, especially for large networks which model entire organisms.
This is an underaddressed problem in general, and the analytic techniques that exist currently are difficult to combine with experimental data.
The central hypothesis of this thesis is that statistical analysis of large datasets of simulated metabolic fluxes is an effective way to gain insight into the structure of metabolic networks.
These datasets can be either simulated or experimental, allowing insight on real world data while retaining the large sample sizes only easily possible via simulation.
This work demonstrates that this approach can yield results in detecting structure in both a population of solutions and in the network itself.
This work begins with a taxonomy of sampling methods over metabolic networks, before introducing three case studies, of different sampling strategies.
Two of these case studies represent, to my knowledge, the largest datasets of their kind, at around half a million points each.
This required the creation of custom software to achieve this in a reasonable time frame, and is necessary due to the high dimensionality of the sample space.
Next, a number of techniques are described which operate on smaller datasets.
These techniques, focused on pairwise comparison, show what can be achieved with these smaller datasets, and how in these cases, visualisation techniques are applicable which do not have simple analogues with larger datasets.
In the next chapter, Similarity Network Fusion is used for the first time to cluster organisms across several levels of biological organisation, resulting in the detection of discrete, quantised biological states in the underlying datasets.
This quantisation effect was maintained across both real biological data and Monte-Carlo simulated data, with related underlying biological correlates, implying that this behaviour stems from the network structure itself, rather than from the genetic or regulatory mechanisms that would normally be assumed.
Finally, Hierarchical Block Matrices are used as a model of multi-level network structure, by clustering reactions using a variety of distance metrics: first standard network distance measures, then by Local Network Learning, a novel approach of measuring connection strength via the gain in predictive power of each node on its neighbourhood.
The clusters uncovered using this approach are validated against pre-existing subsystem labels and found to outperform alternative techniques.
Overall this thesis represents a significant new approach to metabolic network structure detection, as both a theoretical framework and as technological tools, which can readily be expanded to cover other classes of multilayer network, an under explored datatype across a wide variety of contexts.
In addition to the new techniques for metabolic network structure detection introduced, this research has proved fruitful both in its use in applied biological research and in terms of the software developed, which is experiencing substantial usage.EPSR
Recent Advances in Embedded Computing, Intelligence and Applications
The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
Parsimonious Gene Correlation Network Analysis (PGCNA): a tool to define modular gene co-expression for refined molecular stratification in cancer
Cancers converge onto shared patterns that arise from constraints placed by the biology of the originating cell lineage and microenvironment on programs driven by oncogenic events. Here we define consistent expression modules reflecting this structure in colon and breast cancer by exploiting expression data resources and a new computationally efficient approach that we validate against other comparable methods. This approach, Parsimonious Gene Correlation Network Analysis (PGCNA), allows comparison of network structures between these cancer types identifying shared modules of gene co-expression reflecting: cancer hallmarks, functional and structural gene batteries, copy number variation and biology of originating lineage. These networks along with the mapping of outcome data at gene and module level provide an interactive resource that generates context for relationships between genes within and between such modules. Assigning module expression values (MEVs) provides a tool to summarize network level gene expression in individual cases illustrating potential utility in classification and allowing analysis of linkage between module expression and mutational state. Exploiting TCGA data thus defines both recurrent patterns of association between module expression and mutation at data-set level, and exemplifies the polarization of mutation patterns with the leading edge of module expression at individual case level. We illustrate the scalable nature of the approach within immune response related modules, which in the context of breast cancer demonstrates the selective association of immune subsets, in particular mast cells, with the underlying mutational pattern. Together our analyses provide evidence for a generalizable framework to enhance molecular stratification in cancer
Evolutionary Computation
This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Measuring Stability of 3D Chromatin Conformations and Identifying Neuron Specific Chromatin Loops Associated with Schizophrenia Risk
The 23 pairs of chromosomes comprising the human genome are intricately folded within the nucleus of each cell in a manner that promotes efficient gene regulation and cell function. Consequently, active gene rich regions are compartmentally segregated from inactive gene poor regions of the genome. To better understand the mechanisms driving compartmentalization we investigated what would occur if this system was disrupted. By digesting the genome to varying sizes and analyzing the fragmented 3D structure over time, our work revealed essential laws governing nuclear compartmentalization.
At a finer resolution within compartments, chromatin forms loop structures capable of regulating gene expression. Genome wide association studies have identified numerous single nucleotide polymorphisms (SNPs) associated with the neuropsychiatric disease schizophrenia. When these SNPs are not located within a gene it is difficult to gain insight into disease pathology; however, in some cases chromatin loops may link these noncoding schizophrenia risk variants to their pathological gene targets. By generating 3D genome maps, we identified and analyzed loops of glial cells, neural progenitor cells, and neurons thereby expanding the set of genes conferring schizophrenia risk.
The binding of T-cell receptors (TCRs) to foreign peptides on the surface of diseased cells triggers an immune response against the foreign invader. Utilizing available structural information of the TCR antigen interface, we developed computational methods for successful prediction of TCR-antigen binding. As this binding is a prerequisite for immune response, such improvements in binding prediction could lead to important advancements in the fields of autoimmunity and TCR design for cancer therapeutics
- …