20 research outputs found

    Natural sequence variation as a tool to dissect gene expression regulation in Drosophila melanogaster

    Get PDF
    Genetic variation is a major cause of differences between individuals and it represents a powerful tool to study gene regulation. By interfering with cis- Regulatory Modules (CRMs), variants can unravel CRM function. On the other hand, predicting the effect of variants on phenotype by the DNA sequence has proven to be challenging. In this thesis, I use Drosophila embryonic development as a model system to study diversity in gene regulation at the transcriptional level. CRMs can be characterized using multiple genome-wide techniques such as DNase hypersensitivity. However, despite having comprehensive CRM maps, it is still difficult to predict what are the genes regulated by each CRM. Functional methods, such as mutagenesis, are effective but poorly scalable. To address this issue, I developed an eQTL method (called DHS-eQTL) that makes use of naturally occurring genetic variation, to associate CRMs with the genes they regulate. The results reveal 2,967 DHS-eQTLs and indicate a high extent of CRM sharing between genes. We validated the results with in silico and in vitro approaches and I discuss upcoming in vivo experiments. We observed long-range enhancer regulation suggesting that commonly used methods to associate genes and enhancers underestimate their distance. Also, the DHS-eQTLs show that promoter-proximal CRMs have widespread distal activity. The separation between populations causes an increase in genetic differences by drift and adaptation to different environments. We investigated gene expression differences between Drosophila populations from five continents by performing RNA-Seq on 80 inbred fly lines. We performed multiple quality-control tests to ensure that the gene expression dataset is of high quality. Gene expression profiles show detectable diversity among the fly lines from different continents and confirm what has been observed at the genetic level. In particular, the African population is the most separated, while the American, European and Australian ones show less diversity. In addition, we identified 903 gene and 2,021 exon eQTLs. Genetic variants can interfere with Transcription Factor Binding Sites (TFBS) and this might, in turn, lead to changes in chromatin accessibility. We applied LS-GKM (an SVM method that uses gapped k-mers) to learn sequence features of tissue-specific accessible chromatin and predict the impact of natural sequence variation on accessibility. We train LS-GKM on six tissue-specific training sets: neuroectodermal, mesodermal and double negative CRMs divided in promoter-proximal and promoter-distal. The method unbiasedly recovers tissue-specific TFBS and shows good performance despite the small training sets. Finally, we score variants from groups of inbred Drosophila lines. Interestingly, rare variants have a higher impact on accessibility

    Glycosylation in Tribolium castaneum : composition, physiological significance and exploitation for pest control

    Get PDF
    The majority of all the proteins undergoes glycosylation. This post-translational modification of proteins is involved in numerous biological processes and an erroneous glycosylation is often lethal. Following this logic interference with insect glycosylation is likely to be an effective way to control insect pests. Unfortunately, most of the knowledge on insect glycobiology comes from the research on Drosophila which lacks relevance in the context of pest insect control. This work focused on the discovery of the physiological importance of N-glycosylation in the red flour beetle, Tribolium castaneum, which is a pest and a model insect. Additionally, this PhD thesis investigates the use of glycan-binding proteins (lectins) and the disruption of N-glycosylation as control strategies against pest beetles. Lectins have high insecticidal activity against insect cells but when fed to the red flour beetle their efficiency was greatly impaired by susceptibility to proteolysis, low efficiency of passing through the peritrophic matrix and inefficient transport to the hemolymph. These factors restricting the insecticidal properties of lectins could be generalized to virtually all insecticidal proteins. Therefore these data can be used for a more rational selection of novel insecticidal toxins and enhancement of the activity of the currently used ones. By studying glycan composition, gene expression analysis and functional genomics it was determined that N-glycosylation is involved in insect metamorphosis. Regulated production of N-glycans was crucial for larval growth, progression of the life stages and development of adult appendages. Finally, disruption of the early stages of the N-glycosylation pathway appears to be promising strategy for future control of insect pests

    Multi-Label Dimensionality Reduction

    Get PDF
    abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201

    Mechanisms Driving Karyotype Evolution and Genomic Architecture

    Get PDF
    Understanding of the origin of species and their adaptability to new environments is one of the main questions in biology. This is fueled by the ongoing debate on species concepts and facilitated by the availability of an unprecedented large number of genomic resources. Genomes are organized into chromosomes, where significant variations in number and morphology are observed among species due to large-scale structural variants such as inversions, translocations, fusions, and fissions. This genomic reshuffling provides, in the long term, new chromosomal forms on which natural selection can act upon, contributing to the origin of biodiversity. This book contains mainly articles, reviews, and an opinion piece that explore numerous aspects of genome plasticity among taxa that will help in understanding the dynamics of genome composition, the evolutionary relationships between species and, in the long run, speciation

    Genes: Multigene Families, Control of Gene Expression, Genetic contributions to Human Diseases, including Chromosomal Fragile Sites and ‘Dynamic’ and ‘Non-self’ Mutations

    Get PDF
    The early work in this thesis utilizes the general approach of comparative analysis. In order to find out the relationship between entities (either functional or genetic) my colleagues and I have attempted to identify the important elements by detecting similarity between those entities that act in a similar manner. The philosophy behind this approach is simply that when two distinct objects perform a similar process then the requirements essential for that process will be revealed as similarities between those objects above a noise of difference between them. The use of comparative analysis in biological systems is an attempt to identify natural order from apparent chaos. This work includes but is not limited to :- 1. discovery of the family of kallikrein genes and exploration of their roles in biology, 2. identification of the DNA sequence elements required for hormonal and heavy metal control of metallothionein gene expression 3. discovery of at least some of the necessary and sufficient conditions for the appearance of fragile sites on chromosomes, and their consequent contributions to disease, 4. the molecular properties of repeat DNA sequence expansion that lead to dynamic mutation and consequent fragile site expression and / or disease pathogenesis. In a sense the use of genetic animal models in order to study gene function and pathogenesis follows similar logic of comparative analysis – the mutation of a single endogenous gene or the expression of a single introduced mutated gene in a (presumed) constant genetic background to enable the biological consequences of the genetic mutation or aberrant gene expression by comparing animals from the ‘wild-type’ or parent line with those that now carry the mutation or altered gene. This approach has been utilized in the most recent work contained herein as a means to determine gene function and / or to model human genetic disease pathogenesis, specifically pathogenic mechanisms of the protein WWOX in cancer and expanded repeat RNAs in neurodegenerative diseases. The culmination of this recent work is the development of an hypothesis – 4. that expanded repeat double-stranded RNA leads to neurodegeneration through its recognition by the RNA-binding pattern recognition receptors as a ‘non-self’ or foreign nucleic acid due to a paucity of RNA modification. The resultant pathogenic mechanism is therefore autoinflammatory disease. Given the wide range and variety of evidence of inflammatory activation in neurodegenerative diseases in general, this mechanism is therefore hypothesized to be the general causal mechanism for most (or all) of these diseases. A specific Introduction - highlighting the nature and significance of the work, and a Conclusion – of how this work has contributed to knowledge, are given at the start of each chapter, while the impact of the various components of this work is indicated by the number of citations for each of the included publications. Authorship contributions to each of the included publications in this work are also indicated with each specific reference.Thesis (DSc) -- University of Adelaide, School of Biological Sciences, 202

    Annual Report

    Get PDF
    corecore