83 research outputs found
Biophysical Perspective: The Latest Twists in Chromatin Remodeling
International audienceIn its most restrictive interpretation, the notion of chromatin remodeling refers to the action of chromatin remodeling enzymes on nucleosomes with the aim to displace and remove them from the chromatin fiber (the effective polymer formed by a DNA molecule and proteins). This local modification of the fiber structure can have consequences for the initiation and repression of the transcription process and, when the remodeling processes spreads along the fiber, also results in long-range effects essential for fiber condensation. There are three regulatory levels of relevance that can be distinguished for this process: the first is the intrinsic sequence preference of the histone octamer which rules the positioning of the nucleosome along the DNA, notably in relation to the genetic information coded in DNA, the second is the recognition or selection of nucleosomal substrates by remodeling complexes, and the final one the motor action on the nucleosome exerted by the chromatin remodeler. On each of these three levels recent work has been able to provide crucial insights which add new twists to this exciting and unfinished story, which we highlight in this perspective
The Impact of Base Stacking on the Conformations and Electrostatics of Single-Stranded DNA
Single-stranded DNA (ssDNA) is notable for its interactions with ssDNA binding proteins (SSBs) during fundamentally important biological processes including DNA repair and replication. Previous work has begun to characterize the conformational and electrostatic properties of ssDNA in association with SSBs. However, the conformational distributions of free ssDNA have been difficult to determine. To capture the vast array of ssDNA conformations in solution, we pair small angle X-ray scattering with novel ensemble fitting methods, obtaining key parameters such as the size, shape and stacking character of strands with different sequences. Complementary ion counting measurements using inductively coupled plasma atomic emission spectroscopy are employed to determine the composition of the ion atmosphere at physiological ionic strength. Applying this combined approach to poly dA and poly dT, we find that the global properties of these sequences are very similar, despite having vastly different propensities for single-stranded helical stacking. These results suggest that a relatively simple mechanism for the binding of ssDNA to non-specific SSBs may be at play, which explains the disparity in binding affinities observed for these systems
Analysing and quantitatively modelling nucleosome binding preferences
The main emphasis of my work as a PhD student was the analysis and prediction of nucleosome positioning, focusing on the role sequence features play.
Part I gives a broad overview of nucleosomes, before defining important technical terms. It continues by describing and reviewing experiments that measure nucleosome positioning and bioinformatic methods that learn the sequence preferences of nucleosomes to predict their positioning.
Part II describes a collaboration project with the Gaul-lab, where I analyzed MNase-Seq measurements of nucleosomes in Drosophila. The original intention was to investigate the extent to which experimental biases influence the measurements. We extended the analysis to categorize and explore fragile, average and resistant nucleosome populations. I focused on the relation between nucleosome fragility and the sequence landscape, especially at promoters and enhancers. Analyzing the partial unwrapping of nucleosomes genome-wide, I found that the G+C ratio is a determinant of asymmetric unwrapping. I excluded an analysis of histone modifications from this work, which was part of this collaboration, due to its low relevance to the rest of the presented work.
Part III describes my main project of developing a probabilistic nucleosome-position prediction method. I developed a maximum likelihood approach to learn a biophysical model of nucleosome binding. By including the low positional resolution of MNase-Seq and the sequence bias of CC-Seq into the likelihood, I could separate them from the nucleosome binding preferences and learn highly correlated nucleosome binding energy models. My analysis shows that nucleosomes have a position-specific binding preference and might be uninfluenced by G+C content or even disfavor it – contrary to the Consensus in literature.
Part IV describes further analysis I did during my time as a PhD student that are not part of any planned publications. The main topics are: ancillary elements of my main project, unsuccessful attempts to correct experimental biases, analysis of the quality of experimental measurements, and adapting my probabilistic nucleosome-position prediction method to work with occupancy measurements. Lastly, I give a general outlook that reflects on my results and discusses next steps, like ways to improve my method further. I excluded two collaboration projects I participated in from this thesis, because they are still ongoing: a systematic analysis of how the core promoter sequence influences gene expression in Drosophila and the development of an experiment to measure nucleosome occupancy more precisely
The mechanical genome : inquiries into the mechanical function of genetic information
The four possible segments
A, T, C and G that link together to form DNA molecules, and with their ordering
encode genetic information, are not only different in name, but also in their
physical and chemical properties. The result is that DNA molecules with
different sequences have different physical behavior. For instance, one
sequence may lead to a very flexible DNA molecule, another to a very stiff one.
A DNA molecule with a given sequence may be straight, or intrinsically curved.
This leads to an interplay between the information stored in a DNA molecule on
one hand, and the physical properties of that molecule on the other. This is of
great importance in our cells, where lengths of DNA far longer than the size of
the cells that contain them need to be significantly folded up. The research
presented in this thesis looks at how we can model this interplay, what its
effects can be, and whether nature has made use of it to encode mechanical
signals into real genomes.Theoretical Physic
Practical Approaches to Biological Network Discovery
This dissertation addresses a current outstanding problem in the field of systems biology, which is to identify the structure of a transcriptional network from high-throughput experimental data. Understanding of the connectivity of a transcriptional network is an important piece of the puzzle, which relates the genotype of an organism to its phenotypes. An overwhelming number of computational approaches have been proposed to perform integrative analyses on large collections of high-throughput gene expression datasets to infer the structure of transcriptional networks. I put forth a methodology by which these tools can be evaluated and compared against one another to better understand their strengths and weaknesses. Next I undertake the task of utilizing high-throughput datasets to learn new and interesting network biology in the pathogenic fungus Cryptococcus neoformans. Finally I propose a novel computational method for mapping out transcriptional networks that unifies two orthogonal strategies for network inference. I apply this method to map out the transcriptional network of Saccharomyces cerevisiae and demonstrate how network inference results can complement chromatin immunoprecipitation: ChIP) experiments, which directly probe the binding events of transcriptional regulators. Collectively, my contributions improve both the accessibility and practicality of network inference methods
Recommended from our members
Improved methods for single-particle cryogenic electron microscopy
Biological macromolecules such as enzymes are nanoscale machines. This is true in a concrete sense: if the atomic structure of a biological macromolecule can be obtained, the theories of mechanics and intermolecular forces can be applied to explain how the machine works in terms that engineers would understand, including motors, ratchets, gates and transducers. Nevertheless, biological macromolecules are complex, fragile and extremely small, so obtaining their structures is a challenging experimental endeavor. Single-particle cryogenic electron microscopy (cryo-EM) is a technique for determining the 3D structure of a biological macromolecule from a large set of 2D electron micrographs of individual structurally-identical particles. To obtain such images, a solution of the macromolecules must be prepared in the frozen-hydrated state, embedded in a thin electron-transparent glassy film of water. This specimen must then be imaged with a very short exposure to avoid radiation damage. A powerful computer must then be used to sort, align, and average the 2D particle images to back-calculate the 3D structure. At its best, cryo-EM can determine the structures of biological macromolecules to atomic resolution. In practice, this goal is usually not achieved. Cryo-EM has gotten significantly more powerful in the past few years due to improvements in equipment and methodology. Several of the most significant advances originated in the labs of David Agard and Yifan Cheng at UCSF. When I began my PhD with Yifan, the spirit in the lab was that cryo-EM could keep getting better and better: with enough engineering, determining the 3D structure of an arbitrary biological macromolecule would be as routine an experiment as gel electrophoresis or DNA sequencing. Inspired, I took on projects in the lab that I thought would move the field closer to that goal. In the first chapter of this thesis, I describe work I did supporting a project initiated by David Agard and his long-time scientific programmer Shawn Zheng. They developed and implemented an algorithm, MotionCor2, for correcting the complex, anisotropic movements that occur when a frozen-hydrated specimen interacts with the high-energy electron beam. My role was to benchmark MotionCor2 on a panel of real-world 3D reconstruction tasks. I was able to show that MotionCor2 restored the highest resolution details in the images, ultimately yielding significantly better structures than simpler algorithms. For me, this projected highlighted the importance of benchmarking an algorithm for use in routine real-world conditions with the right metrics. In chapter 1, I include the manuscript for the MotionCor2 study, formatted to highlight my contributions that were moved to the supplement in the original publication by Nature Methods. One of the major remaining issues with cryo-EM is sample preparation: preparing the thin freestanding films of frozen-hydrated particles necessarily exposes those particles to air-water interfaces. Many fragile macromolecular complexes denature when exposed to such interfaces, preventing structure determination with cryo-EM. In chapters 2 and 3, I describe my efforts to develop a simple, robust approach to stabilizing fragile macromolecular complexes during the vitrification process. In chapter 2, I develop a method for coating EM grids with an electron-transparent and functionalizable graphene-oxide support film. I demonstrate that such GO grids are compatible with high-resolution structure determination. This work was published in the Journal of Structural Biology in 2018. In chapter 3, I extend this work by functionalizing GO grids with nucleic acids, enabling routine structure determination of uncrosslinked chromatin specimens. In on-going work, I used nucleic acid grids to solve high-resolution structures of a highly fragile specimen, the snf2h-nucleosome complex, and analyzed the conformational heterogeneity of the nucleosome substrate. These results were made possible by the nucleic acid grid, as the other major approach for stabilizing chromatin specimens, chemical crosslinking, not work for this specimen.Perhaps the most fundamental problem with single-particle cryo-EM is the radiation sensitivity of frozen-hydrated macromolecules. To image biological matter with electrons is to destroy it, so obtaining images of undamaged specimens requires very short, highly under sampled exposures. The resultant images are extremely noisy and low contrast, with most particles barely visible from the background. In chapter 4, I describe a novel computational approach to generating contrast in cryo-EM. Using a recently described machine learning strategy for training a parameterized denoising algorithm, I developed a computer program, restore, that denoises cryo-EM images, greatly enhancing their contrast and interpretability. This program leverages recent advances in computer vision and deep learning which have not yet been widely used in cryo-EM image processing algorithms. To characterize the performance of the algorithm on real-world data, I extended conventional metrics for image resolution to measure how an arbitrary transformation affects images at different spatial frequencies. These novel metrics are general and may be useful for characterizing other nonlinear reconstruction algorithms in cryo-EM and medical imaging. Finally, I showed that denoised cryo-EM images maintain the high-resolution information required for accurate 3D reconstruction. Denoising can be applied to conventional cryo-EM images and can be reversed whenever necessary. I have made the software for restore program publicly available and have submitted a manuscript for peer-reviewed publication
Recommended from our members
Intrinsically Disordered Proteins within the Genome
The hundreds of millions of DNA base pairs within eukaryotic cells are not found free but packed inside the micrometre-sized nuclei through the formation of a macromolecular structure known as chromatin. Chromatin consists of a chain of nucleosomes – nucleoprotein complexes where the DNA makes ∼1.75 turns around a protein octamer core composed of two copies each of H2A, H2B, H3 and H4 histones. A fifth histone H1 binds on the nucleosomal surface close to the entry/exit site of DNA, interacts with linker DNA and aids in chromatin compaction. Enabling the condensation of DNA to fit into the nucleus is however only one-half of chromatin’s role. The three-dimensional spatial organization of chromatin serves a second important role in allowing the capability to exert control over gene expression. The chromatin structure thus serves as an additional layer of complexity above the genome code and permits the transcription of different proteins varying with cell lineages/cycles.
The proteins that makeup, modify and read the chromatin structure are particularly enriched in `Intrinsic Disorder’ – a class of proteins lacking a well-defined structure but existing as a dynamic ensemble of rapidly interchanging states. While folded proteins with well-defined structures are amenable to be characterized through standard methods of protein structure determination, the `plasticity’ of the disordered proteins challenges the use of such ensemble averaged techniques. In this thesis, Molecular Dynamics simulations are used to characterize the disordered regions of three proteins that form the core of chromatin structure: histones, linker histones (H1) and heterochromatin protein (HP1). The carboxy-terminal domain of H1 when within the nucleosome, adopts a compact but unstructured conformation that allows its positioning between the two linker DNA strands. In contrast, the amino-terminal domain of H1 undergoes a disorder-to-order transition to an amphiphilic helical conformation. The transition to the amphiphilic helix is however subtype-dependant with the degree of condensation varying with the subtypes' nucleosomal affinity. Finally, the simulations demonstrate that the affinity of HP1 subtypes for the H3 histone is caused by the synergetic effects of both the proteins' unstructured amino-terminal domain and the structured chromodomain
Three-dimensional Folding of Eukaryotic Genomes
Chromatin packages eukaryotic genomes via a hierarchical series of folding steps, encrypting multiple layers of epigenetic information, which are capable of regulating nuclear transactions in response to complex signals in environment. Besides the 1-dimensinal chromatin landscape such as nucleosome positioning and histone modifications, little is known about the secondary chromatin structures and their functional consequences related to transcriptional regulation and DNA replication. The family of chromosomal conformation capture (3C) assays has revolutionized our understanding of large-scale chromosome folding with the ability to measure relative interaction probability between genomic loci in vivo. However, the suboptimal resolution of the typical 3C techniques leaves the levels of nucleosome interactions or 30 nm structures inaccessible, and also restricts their applicability to study gene level of chromatin folding in small genome organisms such as yeasts, worm, and plants. To uncover the “blind spot” of chromatin organization, I developed an innovative method called Micro-C and an improved protocol, Micro-C XL, which enable to map chromatin structures at all range of scale from single nucleosome to the entire genome. Several fine-scale aspects of chromatin folding in budding and fission yeasts have been identified by Micro-C, including histone tail-mediated tri-/tetra-nucleosome stackings, gene crumples/globules, and chromosomally-interacting domains (CIDs). CIDs are spatially demarcated by the boundaries, which are colocalized with the promoters of actively transcribed genes and histone marks for active transcription or turnover. The levels of chromatin compaction are regulated via transcription-dependent or transcription-independent manner – either the perturbations of transcription or the mutations of chromatin regulators strongly affect the global chromatin folding. Taken together, Micro-C further reveals chromatin folding behaviors below the sub-kilobase scale and opens an avenue to study chromatin organization in many biological systems
Analysing and quantitatively modelling nucleosome binding preferences
The main emphasis of my work as a PhD student was the analysis and prediction of nucleosome positioning, focusing on the role sequence features play.
Part I gives a broad overview of nucleosomes, before defining important technical terms. It continues by describing and reviewing experiments that measure nucleosome positioning and bioinformatic methods that learn the sequence preferences of nucleosomes to predict their positioning.
Part II describes a collaboration project with the Gaul-lab, where I analyzed MNase-Seq measurements of nucleosomes in Drosophila. The original intention was to investigate the extent to which experimental biases influence the measurements. We extended the analysis to categorize and explore fragile, average and resistant nucleosome populations. I focused on the relation between nucleosome fragility and the sequence landscape, especially at promoters and enhancers. Analyzing the partial unwrapping of nucleosomes genome-wide, I found that the G+C ratio is a determinant of asymmetric unwrapping. I excluded an analysis of histone modifications from this work, which was part of this collaboration, due to its low relevance to the rest of the presented work.
Part III describes my main project of developing a probabilistic nucleosome-position prediction method. I developed a maximum likelihood approach to learn a biophysical model of nucleosome binding. By including the low positional resolution of MNase-Seq and the sequence bias of CC-Seq into the likelihood, I could separate them from the nucleosome binding preferences and learn highly correlated nucleosome binding energy models. My analysis shows that nucleosomes have a position-specific binding preference and might be uninfluenced by G+C content or even disfavor it – contrary to the Consensus in literature.
Part IV describes further analysis I did during my time as a PhD student that are not part of any planned publications. The main topics are: ancillary elements of my main project, unsuccessful attempts to correct experimental biases, analysis of the quality of experimental measurements, and adapting my probabilistic nucleosome-position prediction method to work with occupancy measurements. Lastly, I give a general outlook that reflects on my results and discusses next steps, like ways to improve my method further. I excluded two collaboration projects I participated in from this thesis, because they are still ongoing: a systematic analysis of how the core promoter sequence influences gene expression in Drosophila and the development of an experiment to measure nucleosome occupancy more precisely
Epigenetic modelling: DNA methylation and working towards model parameterisation
The main focus of the research in this thesis is the investigation in DNA methylation mechanisms
of epigenetics and the study of a specific database. As part of the latter work, the role of curation
is described, and a new knowledge management system, PathEpigen1 , is reported that is currently
being developed for colon cancer in the Sci-Sym centre. The database deals with genetic and epigenetic
interactions and contains considerable data on molecular events such as genetic and epigenetic events.
The data curation includes biomedical and biological information. An efficient method was devised to
extract biological information from the literature to process, manage and upgrade data. We present
a Deterministic Finite Automata (DFA) model for the DNA methylation mechanism controlled by
DNA methyltransferase (DNMT) enzymes. This thesis provides a brief introduction to epigenetics, a
survey of ongoing research on computational epigenetics and a description of the DNA methylation
database. Furthermore, it also gives an overview of DNA methylation and its importance in cancer.
The DFA models three states of methylation frequency (normal, de-novo and hypermethylated) in the
cell. It has been executed on input of random strings of size 100. Out of the strings considered, we
found that 26%, 37% and 37% correspond to normal, de-novo (cancer initiation) and hypermethylated
(cancer) states, respectively
- …