44 research outputs found

    Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments

    Get PDF
    Signaling cascades are triggered by environmental stimulation and propagate the signal to regulate transcription. Systematic reconstruction of the underlying regulatory mechanisms requires pathway-targeted, informative experimental data. However, practical experimental design approaches are still in their infancy. Here, we propose a framework that iterates design of experiments and identification of regulatory relationships downstream of a given pathway. The experimental design component, called MEED, aims to minimize the amount of laboratory effort required in this process. To avoid ambiguity in the identification of regulatory relationships, the choice of experiments maximizes diversity between expression profiles of genes regulated through different mechanisms. The framework takes advantage of expert knowledge about the pathways under study, formalized in a predictive logical model. By considering model-predicted dependencies between experiments, MEED is able to suggest a whole set of experiments that can be carried out simultaneously. Our framework was applied to investigate interconnected signaling pathways in yeast. In comparison with other approaches, MEED suggested the most informative experiments for unambiguous identification of transcriptional regulation in this system

    Exploration of large molecular datasets using global gene networks : computational methods and tools

    Get PDF
    Defining gene expression profiles and mapping complex interactions between molecular regulators and proteins is a key for understanding biological processes and the functional properties of cells, which is therefore, the focus on numerous experimental studies. Small-scale biochemical analyses deliver high-quality data, but lack coverage, whereas high throughput sequencing reveals thousands of interactions which can be error-prone and require proper computational methods to discover true relations. Furthermore, all these approaches usually focus on one type of interaction at a time. This makes experimental mapping of the genome-wide network a cost and time-intensive procedure. In the first part of the thesis, I present the developed network analysis tools for exploring large- scale datasets in the context of a global network of functional coupling. Paper I introduces NEArender, a method for performing pathway analysis and determines the relations between gene sets using a global network. Traditionally, pathway analysis did not consider network relations, thereby covering a minor part of the whole picture. Placing the gene sets in the context of a network provides additional information for pathway analysis, which reveals a more comprehensive picture. Paper II presents EviNet, a user-friendly web interface for using NEArender algorithm. The user can either input gene lists or manage and integrate highly complex experimental designs via the interactive Venn diagram-based interface. The web resource provides access to biological networks and pathways from multiple public or users’ own resources. The analysis typically takes seconds or minutes, and the results are presented in a graphic and tabular format. Paper III describes NEAmarker, a method to predict anti-cancer drug targets from enrichment scores calculated by NEArender, thus presenting a practical usage of network enrichment tool. The method can integrate data from multiple omics platforms to model drug sensitivity with enrichment variables. In parallel, alternative methods for pathway enrichment analysis were benchmarked in the paper. The second part of the thesis is focused on identifying spatial and temporal mechanisms that govern the formation of neural cell diversity in the developing brain. High-throughput platforms for RNA- and ChIP-sequencing were applied to provide data for studying the underlying biological hypothesis at the genome-wide scale. In Paper IV, I defined the role of the transcription factor Foxa2 during the specification and differentiation of floor plate cells of the ventral neural tube. By RNA-seq analyses of Foxa2-/- cells, a large set of candidate genes involved in floor plate differentiation were identified. Analysis of Foxa2 ChIP-seq dataset suggested that Foxa2 directly regulated more than 250 genes expressed by the floor plate and identified Rfx4 and Ascl1 as co-regulators of many floor plate genes. Experimental studies suggested a cooperative activator function for Foxa2 and Rfx4 and a suppressive role for Ascl1 in spatially constraining floor plate induction. Paper V addresses how time is measured during sequential specification of neurons from multipotent progenitor cells during the development of ventral hindbrain. An underlying timer circuitry which leads to the sequential generation of motor neurons and serotonergic neurons has been identified by integrating experimental and computational data modeling

    Network-based identification of driver pathways in clonal systems

    Get PDF
    Highly ethanol-tolerant bacteria for the production of biofuels, bacterial pathogenes which are resistant to antibiotics and cancer cells are examples of phenotypes that are of importance to society and are currently being studied. In order to better understand these phenotypes and their underlying genotype-phenotype relationships it is now commonplace to investigate DNA and expression profiles using next generation sequencing (NGS) and microarray techniques. These techniques generate large amounts of omics data which result in lists of genes that have mutations or expression profiles which potentially contribute to the phenotype. These lists often include a multitude of genes and are troublesome to verify manually as performing literature studies and wet-lab experiments for a large number of genes is very time and resources consuming. Therefore, (computational) methods are required which can narrow these gene lists down by removing generally abundant false positives from these lists and can ideally provide additional information on the relationships between the selected genes. Other high-throughput techniques such as yeast two-hybrid (Y2H), ChIP-Seq and Chip-Chip but also a myriad of small-scale experiments and predictive computational methods have generated a treasure of interactomics data over the last decade, most of which is now publicly available. By combining this data into a biological interaction network, which contains all molecular pathways that an organisms can utilize and thus is the equivalent of the blueprint of an organisms, it is possible to integrate the omics data obtained from experiments with these biological interaction networks. Biological interaction networks are key to the computational methods presented in this thesis as they enables methods to account for important relations between genes (and gene products). Doing so it is possible to not only identify interesting genes but also to uncover molecular processes important to the phenotype. As the best way to analyze omics data from an interesting phenotype varies widely based on the experimental setup and the available data, multiple methods were developed and applied in the context of this thesis: In a first approach, an existing method (PheNetic) was applied to a consortium of three bacterial species that together are able to efficiently degrade a herbicide but none of the species are able to efficiently degrade the herbicide on their own. For each of the species expression data (RNA-seq) was generated for the consortium and the species in isolation. PheNetic identified molecular pathways which were differentially expressed and likely contribute to a cross-feeding mechanism between the species in the consortium. Having obtained proof-of-concept, PheNetic was adapted to cope with experimental evolution datasets in which, in addition to expression data, genomics data was also available. Two publicly available datasets were analyzed: Amikacin resistance in E. coli and coexisting ecotypes in E.coli. The results allowed to elicit well-known and newly found molecular pathways involved in these phenotypes. Experimental evolution sometimes generates datasets consisting of mutator phenotypes which have high mutation rates. These datasets are hard to analyze due to the large amount of noise (most mutations have no effect on the phenotype). To this end IAMBEE was developed. IAMBEE is able to analyze genomic datasets from evolution experiments even if they contain mutator phenotypes. IAMBEE was tested using an E. coli evolution experiment in which cells were exposed to increasing concentrations of ethanol. The results were validated in the wet-lab. In addition to methods for analysis of causal mutations and mechanisms in bacteria, a method for the identification of causal molecular pathways in cancer was developed. As bacteria and cancerous cells are both clonal, they can be treated similar in this context. The big differences are the amount of data available (many more samples are available in cancer) and the fact that cancer is a complex and heterogenic phenotype. Therefore we developed SSA-ME, which makes use of the concept that a causal molecular pathway has at most one mutation in a cancerous cell (mutual exclusivity). However, enforcing this criterion is computationally hard. SSA-ME is designed to cope with this problem and search for mutual exclusive patterns in relatively large datasets. SSA-ME was tested on cancer data from the TCGA PAN-cancer dataset. From the results we could, in addition to already known molecular pathways and mutated genes, predict the involvement of few rarely mutated genes.nrpages: 246status: publishe

    Computational Proteomics Using Network-Based Strategies

    Get PDF
    This thesis examines the productive application of networks towards proteomics, with a specific biological focus on liver cancer. Contempory proteomics (shot- gun) is plagued by coverage and consistency issues. These can be resolved via network-based approaches. The application of 3 classes of network-based approaches are examined: A traditional cluster based approach termed Proteomics Expansion Pipeline), a generalization of PEP termed Maxlink and a feature-based approach termed Proteomics Signature Profiling. PEP is an improvement on prevailing cluster-based approaches. It uses a state- of-the-art cluster identification algorithm as well as network-cleaning approaches to identify the critical network regions indicated by the liver cancer data set. The top PARP1 associated-cluster was identified and independently validated. Maxlink allows identification of undetected proteins based on the number of links to identified differential proteins. It is more sensitive than PEP due to more relaxed requirements. Here, the novel roles of ARRB1/2 and ACTB are identified and discussed in the context of liver cancer. Both PEP and Maxlink are unable to deal with consistency issues, PSP is the first method able to deal with both, and is termed feature-based since the network- based clusters it uses are predicted independently of the data. It is also capable of using real complexes or predicted pathway subnets. By combining pathways and complexes, a novel basis of liver cancer progression implicating nucleotide pool imbalance aggravated by mutations of key DNA repair complexes was identified. Finally, comparative evaluations suggested that pure network-based methods are vastly outperformed by feature-based network methods utilizing real complexes. This is indicative that the quality of current networks are insufficient to provide strong biological rigor for data analysis, and should be carefully evaluated before further validations.Open Acces

    Structurally Primed Phage display Libraries

    Get PDF
    "Therapeutic monoclonal antibodies (mAbs) are one of the main drivers of revenue of the pharmaceutical market. Regardless of the origin and platform used, monoclonal antibodies generated against a given target may have room for improvement. Using in vitro affinity maturation libraries aims to surpass the throughput limitations of classical X-ray crystallography affinity maturation approaches, by providing a generalizable approach (or blind) that can be applied to many candidates. The current blind methods do not always assure that synergistic mutations are found and may not respect the structural constraints of the IgG molecule in question. Ideally, innovative affinity maturation methods should be generalizable to provide high-throughput results while maintaining a certain degree of specificity towards the antibody structure being considered. As such they require attention to be paid to specific regions, such as the ones likely to be in contact with the antigen, or regions that influence the antibodies’ structural integrity and overall developability.(...)"N/

    The scent of genome complexity: exploring genomic instability in mouse Olfactory Epithelium

    Get PDF
    In the olfactory epithelium (OE) the detection of volatile compounds (odors) is accomplished by a large family of olfactory receptors (ORs), located on the surface of the cilia of olfactory sensory neurons (OSNs). These represent the major sensory component of the OE and reside in the nasal cavity. The extraordinary chemical diversity of olfactory ligands is matched in the mouse genome by a collection of more than 1200 mouse and 350 human active OR genes encoding for G-protein-coupled receptors (GPCRs). Each mature OSN in the OE is thought to express only one allele of a single OR gene (monoallelic and monogenic expression). A given OR gene is expressed in a mosaic or punctate pattern of OSNs within a characteristic zone of the OE. The transcriptional mechanisms that underlie this extraordinarily tight regulation of gene expression remain unclear. I hypothesize that OR expression choice can be influenced by somatic LINE-1- associated genomic variations. Indeed, it is now well established that active LINE-1s can create genomic rearrangements at insertional and post-insertional stages. Besides promoting genome plasticity and diversification during evolution, somatic variations can contribute to gene expression regulation for those genes that are characterized by a stochastic and monoallelic expression.Under this hypothesis, I expect the genomic sequence around the expressed ORs to be different with respect to that around the same ORs in non-expressing cells, for the presence of variations able to activate chromatin and promote ORs transcription. I first showed high LINE-1 expression and retrotransposition in OE. Then I investigated the presence and involvement of LINE-1-associated variations with OR expression, comparing the genomic sequence around an active and an inactive OR locus. In particular, I analyzed a genomic region of 50 kb around the Olfr2 TSS taking advantage of a GFP knock-in mouse. In these mice, the OSNs naturally expressing Olfr2 co-express also GFP. Targeted sequencing of Olfr2 locus revealed hundreds of heterozygous structural variants (insertions, deletions, inversions and duplications) in the vicinity of the locus. Deletions were the most abundant variation category.By end point PCR I validated six LINE-1 associated deletions potentially involved in Olfr2 expression. Nevertheless, functional validation experiments in vivo will be performed to prove their effective role in Olfr2 choice. Looking at the putative mechanisms supporting the deletions, I started investigating a possible involvement of DSBs. With this aim, I performed a chromatin immunoprecipitation and sequencing (ChIP-Seq) analysis for endogenous gamma-H2AX (an early response marker for DNA-DSBs) in mouse OE and liver. I performed a general characterization of endogenous gamma-H2AX in normal tissues. In both tissues analyzed, gamma- H2AX signal was not randomly distributed in the genome but preferentially localized within transcribed and regulatory regions. Overall, gamma-H2AX peaks were depleted in the OR clusters. Interestingly, an exception was given by a peak located within the Olfr2 locus, in close proximity to two validated deleted regions
    corecore