22 research outputs found
Deep Learning for Population Genetic Inference
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statis- tics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Inter- estingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme
Structural Prediction of ProteinâProtein Interactions by Docking: Application to Biomedical Problems
A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of proteinâprotein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in proteinâprotein interactions, or providing modeled structural data for drug discovery targeting proteinâprotein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a
predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the
Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in
Computational Biology.Peer ReviewedPostprint (author's final draft
Recommended from our members
Transcription regulation: models for combinatorial regulation and functional specificity
Gene regulation id controlled by transcription factor proteins that bind to specific DNA sequences, known as transcription factor binding sites (TFBSs). Combinations of transcription factors working, co-operatively in cis-regulatory modules (CRMs), play a role in regulating gene expression. Current computational methods for TFBS prediction cannot distinguish between functional and non-functional sites, and predict very large numbers of false positives.
The thesis focuses on the development of a novel computational model, based on artificial neural networks (ANNs), for the identification of functional TFBSs, and the CRMs within which they operate in the human genome. Datasets of 12,239 experimentally verified true positive (TP) TFBSs and 130,199 false positive (FP) TFBSs were extracted using a combination of position weight matrices from the JASPAR database and experimentally verified sites from the Encyclopedia of DNA elements (ENCODE). A number of machine learning alsgorithms were tested using a range of genetic information including gene expression, necleosome positioning, DNA methylation states and DNA entropy. The best model, that gave a mean area under the curve under a receiver operator characteristic curve of 0.800, was based on a feedforward ANN using backpropagation.
This model was then used to predict functional TFBSs in a number of gene sets from the human genome. The predictions, combined with experimentally proven TFBSs from ENCODE, were used to investigate combinatorial [atterns of TFBSs operating in CRMs. CRM patterns have been analysed in disease-associated genes located in linkage disequilibrium blocks containing SNPs obtained from Genome Wide Association Studies (GWAS).
The potential for the model to make functional TFBS predictions to aid in the annotation of orphan genes of unknown function is discussed. In addition this thesis presents computational work on a number of smaller published studies
Modelling genetic networks involved in the activity-dependent modulation of adult neurogenesis
Die Bildung neuen Nervenzellen im erwachsenen Gehirnâadulte Neurogeneseâist bei SĂ€ugetieren auf spezifische Regionen beschrĂ€nkt. Eine der beiden bekannten ist der Hippokampus, eine Gehirnstruktur, die eine wichtige Rolle beim Lernen sowie der GedĂ€chtnisbildung spielt. Ein Reservoir von neuralen Stammzellen befindet sich in der subgranulĂ€ren Zone des hippokampalen Gyrus dentatus. Diese Zellen teilen sich fortwĂ€hrend und bilden neue Nervenzellen. Die Regulation adulter hippokampaler Neurogenese wird sowohl von der Umgebung beeinflusst als auch von mehreren Genen gesteuert.
In der vorliegenden Arbeit wurden mittels Hochdurchsatz- Genexpressionsverfahren die an der Neurogenese beteiligten Gene identifiziert und ihr Zusammenspiel untersucht. Anhand von genetischen, umgebungsbedingten und zeitlichen Angaben und Variationen wurde ein vielseitiger Datensatz erstellt, der einen multidimensionalen Blick auf den proliferativen PhÀnotyp verschafft. Netzwerke aus Gen-Gen und Gen-PhÀnotyp Interaktionen wurden beschrieben und in einer mehrschichtigen Ressource zusammengefasst. Ein Kern-Netzwerk bestehend aus immerwiederkehrenden Modulen aus verschiedenen Ebenen wurde anhand von Proliferation als Keim-PhÀnotyp identifiziert. Aus diesem Kern-Netzwerk sind neue Gene und ihre Interaktionen hervorgegangen, die potentiell bei der Regulierung adulter Neurogenesis beteiligt sind.:Zusammenfassung i
Abstract iii
Acknowledgements vii
Contents ix
Preface xiii
General Introduction 1
Adult Neurogenesis 1
Historical setting 1
Neurogenesis exists in two regions of the adult mammalian brain 1
Implications of neurogenesis in the hippocampus 1
The Hippocampal Formation 2
Function of the hippocampus in learning and memory 2
The functional role of adult neurogenesis 2
Anatomy of the hippocampal formation 2
Neural Precursor Biology 3
The subgranular zone as a neurogenic niche 3
Neuronal maturation is a multi-step pathway 3
Regulation of Adult Neurogenesis 3
Neurogenesis is modulated by age 3
Neurogenesis is modulated by environmental factors 4
Neurogenesis is modulated by genetic background 4
Genetics of the BXD RI Cross 5
C57BL/6 and DBA/2 5
Recombinant Inbred Lines 5
The BXD panel 6
Quantitative genetics 6
Microarray Analysis 7
The concept of âwhole genomeâ expression analysis 7
Technical considerations 8
Theoretical considerations 9
Current Analytical Methods 9
Network Analysis 10
Network Description and Terminology 10
Graph Theory 10
Multiple-Network Comparison 11
Biological networks 11
Types of Biological Network 11
Sources of Network Data 12
Biological Significance of Networks 12
Aim of the current work 13
Methods and Materials 15
Animals 15
BXD panel 15
Progenitor strains 15
Animal behaviour 15
Running wheel activity 15
Enriched environment 16
Morris water maze 16
Open field test 16
Corticosterone assay 16
Histology 17
Tissue collection 17
BrdU staining 17
Statistics 17
Cell culture 18
Maintenance and differentiation 18
Immunostaining 18
RNA isolation 18
Microarray processing 18
Affymetrix arrays 18
M430v2 probe reannotation 19
Illumina arrays 19
Illumina probe reannotation 19
Bioinformatics 19
Translating the STRING network 19
QTL mapping 20
Network graph layout 20
Triplot 20
Enrichment analysis 20
Mammalian Adult Neurogenesis Gene Ontology 21
Introduction 21
Results 25
The cell stage ontology 25
The process ontology 25
Genes known to regulate hippocampal adult neurogenesis 26
Enrichment analysis 27
The MANGO gene network 27
Discussion 28
Hippocampal Coexpression Networks from the BXD Panel 31
Introduction 31
Results 32
Variation and covariation of gene expression across a panel of inbred lines 32
A hippocampal expression correlation network 32
Diverse neurogenesis phenotypes associate with discrete transcript networks 34
Discussion 34
Interactions Between Gene Expression Phenotypes and Genotype 37
Introduction 37
Results 39
QTL analysis and interval definitions 39
Pleiotropic loci and âtrans-bandsâ 39
Transcript expression proxy-QTLs can help in dissection of complex phenotypes 41
Interaction network 43
Discussion 43
Strain-Dependent Effects of Environment 47
Introduction 47
Results 48
Effects of strain and environment on precursor cell proliferation 48
Effects of strain and environment on learning behaviour 52
Transcript expression associated with different housing environments 53
Strain differences in transcript regulation 55
Distance-weighted coexpression networks 57
Discussion 58
Expression Time Course from Differentiating Cell Culture 61
Introduction 61
Results 63
Differentiation of proliferating precursors into neurons in vitro 63
Transcripts associated with stages of differentiation 63
Early events in NPC differentiation 64
A network of transcript coexpression during in vitro differentiation 66
Discussion 67
Integrated Gene Interaction Networks 71
Introduction 71
Results 72
Description of network layers 72
Merging of network layers to a multigraph 74
A network of genes controls neural precursor proliferation in the adult hippocampus 75
Novel candidate regulators of adult hippocampal neurogenesis 77
Novel pathways regulating adult hippocampal neurogenesis 77
Discussion 79
General Discussion 81
References 89
SelbstÀndigkeitserklÀrung 107Neurogenesis, the production of new neurons, is restricted in the adult brain of mammals to only a few regions. One of these sites of adult neurogenesis is the hippocampus, a structure essential for many types of learning. A pool of stem cells is maintained in the subgranular zone of the hippocampal dentate gyrus which proliferate and can differentiate into new neurons, astrocytes and oligodendroctytes. Regulation of adult hippocampal neurogenesis occurs in response to en- vironmental stimuli and is under the control of many genes.
This work employs high-throughput gene expression technologies to identify these genes and their interactions with each other and the neurogenesis phenotype. Harnessing variation from genetic, environmental and temporal sources, a multi-faceted dataset has been generated which offers a multidimensional view of the neural precursor proliferation phenotype. Networks of gene-gene and gene-phenotype interac- tions have been described and merged into a multilayer resource. A core subnetwork derived from modules recurring in the different layers has been identified using the proliferation phenotype as a seed. This subnetwork has suggested novel genes and interactions potentially involved in the regulation of adult hippocampal neurogenesis.:Zusammenfassung i
Abstract iii
Acknowledgements vii
Contents ix
Preface xiii
General Introduction 1
Adult Neurogenesis 1
Historical setting 1
Neurogenesis exists in two regions of the adult mammalian brain 1
Implications of neurogenesis in the hippocampus 1
The Hippocampal Formation 2
Function of the hippocampus in learning and memory 2
The functional role of adult neurogenesis 2
Anatomy of the hippocampal formation 2
Neural Precursor Biology 3
The subgranular zone as a neurogenic niche 3
Neuronal maturation is a multi-step pathway 3
Regulation of Adult Neurogenesis 3
Neurogenesis is modulated by age 3
Neurogenesis is modulated by environmental factors 4
Neurogenesis is modulated by genetic background 4
Genetics of the BXD RI Cross 5
C57BL/6 and DBA/2 5
Recombinant Inbred Lines 5
The BXD panel 6
Quantitative genetics 6
Microarray Analysis 7
The concept of âwhole genomeâ expression analysis 7
Technical considerations 8
Theoretical considerations 9
Current Analytical Methods 9
Network Analysis 10
Network Description and Terminology 10
Graph Theory 10
Multiple-Network Comparison 11
Biological networks 11
Types of Biological Network 11
Sources of Network Data 12
Biological Significance of Networks 12
Aim of the current work 13
Methods and Materials 15
Animals 15
BXD panel 15
Progenitor strains 15
Animal behaviour 15
Running wheel activity 15
Enriched environment 16
Morris water maze 16
Open field test 16
Corticosterone assay 16
Histology 17
Tissue collection 17
BrdU staining 17
Statistics 17
Cell culture 18
Maintenance and differentiation 18
Immunostaining 18
RNA isolation 18
Microarray processing 18
Affymetrix arrays 18
M430v2 probe reannotation 19
Illumina arrays 19
Illumina probe reannotation 19
Bioinformatics 19
Translating the STRING network 19
QTL mapping 20
Network graph layout 20
Triplot 20
Enrichment analysis 20
Mammalian Adult Neurogenesis Gene Ontology 21
Introduction 21
Results 25
The cell stage ontology 25
The process ontology 25
Genes known to regulate hippocampal adult neurogenesis 26
Enrichment analysis 27
The MANGO gene network 27
Discussion 28
Hippocampal Coexpression Networks from the BXD Panel 31
Introduction 31
Results 32
Variation and covariation of gene expression across a panel of inbred lines 32
A hippocampal expression correlation network 32
Diverse neurogenesis phenotypes associate with discrete transcript networks 34
Discussion 34
Interactions Between Gene Expression Phenotypes and Genotype 37
Introduction 37
Results 39
QTL analysis and interval definitions 39
Pleiotropic loci and âtrans-bandsâ 39
Transcript expression proxy-QTLs can help in dissection of complex phenotypes 41
Interaction network 43
Discussion 43
Strain-Dependent Effects of Environment 47
Introduction 47
Results 48
Effects of strain and environment on precursor cell proliferation 48
Effects of strain and environment on learning behaviour 52
Transcript expression associated with different housing environments 53
Strain differences in transcript regulation 55
Distance-weighted coexpression networks 57
Discussion 58
Expression Time Course from Differentiating Cell Culture 61
Introduction 61
Results 63
Differentiation of proliferating precursors into neurons in vitro 63
Transcripts associated with stages of differentiation 63
Early events in NPC differentiation 64
A network of transcript coexpression during in vitro differentiation 66
Discussion 67
Integrated Gene Interaction Networks 71
Introduction 71
Results 72
Description of network layers 72
Merging of network layers to a multigraph 74
A network of genes controls neural precursor proliferation in the adult hippocampus 75
Novel candidate regulators of adult hippocampal neurogenesis 77
Novel pathways regulating adult hippocampal neurogenesis 77
Discussion 79
General Discussion 81
References 89
SelbstÀndigkeitserklÀrung 10
Recommended from our members
The development of novel methods for the targeting and manipulation of neural circuits in vivo
Neural networks are at the core of the brainâs ability to compute complex responses to our external environment. Clinically, network dysfunction is emerging as a key component of several psychiatric and neurodegenerative disorders such as Alzheimerâs disease or schizophrenia. However, our ability to precisely and safely manipulate neural networks for research and deliver network-specific therapy remains limited. To address this problem, our lab recently developed a monosynaptically restricted Self-Inactivating Rabies virus (SiR) which enables the targeting of neural circuits without cytotoxicity. To expand the scope of SiR we further developed the technology in two directions: A) By incorporating the CRISPR/CAS9 gene-editing machinery into the SiR genome to successfully edit endogenous loci in vitro and in vivo. B) By designing an improved second generation SiR virus (SiR 2.0) which applies the same SiR technology to a challenge rabies strain (CVS-N2C). SiR 2.0 demonstrates increased neurotropism, increased trans-synaptic transfer efficiency and markedly decreased immunogenicity compared to the SiR 1.0 vector.
These advancements expand the scope of SiR viruses to be used in the genome-editing of circuits in vivo. A combined SiR 2.0 CAS9 virus, in physiology, allows us to investigate the roles of genes within circuits in the brain function of live animals. For therapy, it paves the way for the rabies virusâ potential use to edit disease-related genes in dysfunctional circuits. Despite the circuit-basis of many neurological disorders, existing gene therapy vectors are not circuit specific. In addition, the practical difficulties of delivering therapeutic agents at high doses into the central nervous system exacerbates our inability to achieve high therapeutic loads into affected circuits. In contrast, a SiR 2.0 CAS9 virus would, following injection into peripheral organs, trans-synaptically spread into desired circuits of the central nervous system that are affected in neurological disease (e.g. networks demonstrating pathological protein propagation in neurodegenerative disorders) and edit disease-related genes.
Lastly, our interest in network-level pathological protein propagation also led us to investigate the biology behind this observation. Due to additional evidence that a significant number of other proteins in physiology also show interneuronal movement, we hypothesised that perhaps this is an overlooked phenomena in neurobiology which could have key implication
Protein-protein docking for interactomic studies and its aplication to personalized medicine
[eng] Proteins are the embodiment of the message encoded in the genes and they act as the building blocks and effector part of the cell. From gene regulation to cell signalling, as well as cell recognition and movement, protein-protein interactions (PPIs) drive many important cellular events by forming intricate interaction networks. The number of all non-redundant human binary interactions, forming the so-called interactome, ranges from 130,000 to 650,000 interactions as estimated by different studies. In some diseases, like cancer, these PPIs are altered by the presence of mutations in individual proteins, which can change the interaction networks of the cell resulting in a pathological state. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. To understand how these mutations can alter the PPIs, we need to look at the three-dimensional structure of the protein complexes at the atomic level. However, there are available structures for less than 10% of the estimated human interactome. Computational approaches such as protein-protein docking can help to extend the structural coverage of known PPIs. In the protein-protein docking field, rigid-body docking is a widely used docking approach, since es fast, computationally cheap and is often capable of generating a pool of models within which a near-native structure can be found. These models need to be scored in order to select the acceptable ones from the set of poses. In the present thesis, we have characterized the synergy between combination of protein-protein docking methods and several scoring functions. Our findings provide guides for the use of the most efficient scoring function for each docking method, as well as instruct future scoring functions development efforts Then we used docking calculations to predict interaction hotspots, i.e. residues that contribute the most to the binding energy, and interface patches by including neighbour residues to the predictions. We developed and validated a method, based in the Normalize Interface Propensity (NIP) score. The work of this thesis have extended the original NIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This methodology was also applied to predict the location of 14,551 nsSNPs in 4,254 proteins, for more than 12,000 interactions without 3D structure. We found that 34% of the disease-associated nsSNPs were located at a protein-protein interface. This opens future opportunities for the high-throughput characterization of pathological mutations at the atomic level resolution, and can help to design novel therapeutic strategies to re-stabilize the affected PPIs by disease-associated nsSNPs