242 research outputs found

    IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

    Get PDF
    Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.</p

    Abundance and diversity of endogenous retroviruses in the chicken genome

    Get PDF
    Long terminal repeat (LTR) retrotransposons are autonomous eukaryotic repetitive elements which may elicit prolonged genomic and immunological stress on their host organism. LTR retrotransposons comprise approximately 10 % of the mammalian genome, but previous work identified only 1.35 % of the chicken genome as LTR retrotransposon sequence. This deficit appears inconsistent across birds, as studied Neoaves have contents comparable with mammals, although all birds contain only one LTR retrotransposon class: endogenous retroviruses (ERVs). One group of chicken-specific ERVs (Avian Leukosis Virus subgroup E; ALVEs) remain active and have been linked to commercially detrimental phenotypes, such as reduced lifetime egg count, but their full diversity and range of phenotypic effects are poorly understood. A novel identification pipeline, LocaTR, was developed to identify LTR retrotransposon sequences in the chicken genome. This enabled the annotation of 3.01 % of the genome, including 1,073 structurally intact elements with replicative potential. Elements were depleted within coding regions, and over 40 % of intact elements were found in clusters in gene sparse, poorly recombining regions. RNAseq analysis showed that elements were generally not expressed, but intact transcripts were identified in four cases, supporting the potential for viral recombination and retrotransposition of non-autonomous repeats. LocaTR analysis of seventy-two additional sauropsid genomes revealed highly lineage-specific repeat content, and did not support the proposed deficit in Galliformes. A second, novel bioinformatic pipeline was constructed to identify ALVE insertions in whole genome resequencing data and was applied to eight elite layer lines from Hy-Line International. Twenty ALVEs were identified and diagnostic assays were developed to validate the bioinformatic approach. Each ALVE was sequenced and characterised, with many exhibiting high structural intactness. In addition, a K locus revertant line was identified due to the unexpected presence of ALVE21, confirmed using BioNano optic maps. The ALVE identification pipeline was then applied to ninety chicken lines and 322 different ALVEs were identified, 81 % of which were novel. Overall, broilers and non-commercial chickens had a greater number of ALVEs than were found in layers. Taken together, these two analyses have enabled a thorough characterisation of both the abundance and diversity of chicken ERVs

    Ferrous Iron Sensing and Responding in Pseudomonas aeruginosa

    Get PDF
    Controlling iron distribution is important for all organisms, and is key in bacterial pathogenesis. It has long been understood that cystic fibrosis (CF) patient sputum contains elevated iron concentrations. However, anaerobic bacteria have been isolated from CF sputum and hypoxic zones in sputum have been measured. Because ferrous iron [Fe(II)] is stable in reducing, acidic conditions, it could exist in the CF lung. I show that a two-component system, BqsRS, specifically responds to Fe(II) in the CF pathogen, Pseudomonas aeruginosa. Concurrently, a clinical study found that Fe(II) is present in CF sputum at all stages of lung function decline. Fe(II), not Fe(III) correlates with patients in the most severe disease state. Furthermore, transcripts of the newly identified BqsRS were detected in sputum. Two component systems are the main method bacteria interact with their extracellular environment. A typical two-component system contains a sensor histidine kinase, which upon activation phosphorylates a response regulator that then acts as a transcription factor to elicit a cellular response to stimuli. To explore the mechanism of BqsRS, I describe the Fe(II)-sensing RExxE motif in the sensor BqsS and determine the consensus DNA sequence BqsR binds. With the BqsR binding sequence, I identify novel regulon members through bioinformatic and molecular biology techniques. From the predicted function of new BqsR regulon members, I find that Fe(II) elicits a response that globally protects the cells against cationic stressors, including clinically relevant antibiotics. Subsequently, I use BqsR as a case study to determine if promoter outputs can accurately be predicted based only on a deep understanding of a transcriptional activator’s operator or if a broader regulatory context is required for accurate predictions at all genomic loci. This work highlights the importance of Fe(II) as a (micro)environmental factor, even in conditions typically thought of as aerobic. Since the presence of Fe(II) can alter P. aeruginosa’s antibiotic susceptibility, combining the current strategy of targeting Fe(III) with a new approach targeting Fe(II) may help eradicate infections in the CF lung in the future

    Experimental and computational methods to assign gene function to maize genes

    Get PDF
    Maize is an important crop species and is the highest produced cereal crop in the world as well as a model species for genetics and genomics research. For this reason, researchers have been very successful in translating understanding of basic biological processes into improved crops for over 100 years. Maize researchers have a long history of utilizing genetic techniques to dissect the function of genes that control biological processes. Characterizing and cloning mutants precisely defines gene function but is a slow process that can take years to accomplish. Alternatively, computational methods provide a faster way to assign predicted function to genes by leveraging the vast knowledge base of gene function gathered by experimental and curatorial efforts in multiple species. Computational methods can be used to predict functions for genes at a genome-wide scale. Ideally, improved computational predictions would narrow and target experiments that would be used to test gene function, thus speeding the process of experimental characterization. We have created methods to improve discrete steps in both experimental characterization and computational prediction of gene function in maize. For the experimental work, we have developed molecular methods, leveraging the decreasing high-throughput sequencing cost, and bioinformatics analysis pipelines, capitalizing the availability of multiple maize genome assemblies, that improve positional cloning of maize mutants. We have also focused on methods to improve identification of T-DNA integration locations genome-wide for maize. Genes responsible for mutant phenotypes are often studied using transgenic techniques to manipulate function at a molecular level. These techniques typically integrate a transfer DNA (T-DNA) fragment into the host genome, where genome integration context may have crucial effects on transgene expression. Current methods to identify T-DNA integration locations are either cumbersome or imprecise for repetitive rich genomes like maize. We developed a molecular protocol that utilizes long-read sequencing to enrich genomic T-DNA flanks, thus revealing T-DNA placement more precisely. Working to identify and characterize genetic variants responsible for specific phenotypes gives insight into how critical the quality of predicted gene function annotations can be to inform and guide experimental investigation. Functional annotation data are used for the interpretation of results from large-scale studies such as transcriptomics and proteomics. In addition, these data are also used to inform and prioritize candidate genes potentially responsible for a phenotype for positional cloning, genetic association, and other studies. To improve the quality of predicted gene functions available for all researchers working in maize, we generated a high-coverage, high-confidence, and reproducible functional annotation dataset for maize genes using the Gene Ontology. Methods we used to generate GO annotations for maize are generic and applicable to other plants. To enable application to other species, we formalized the method used to annotate maize as a containerized pipeline called GOMAP. GOMAP has been optimized for use in high- performance computing environments and has been tested on additional maize lines and other plant species

    CaTCHing the functional and structural properties of chromosome folding

    Get PDF
    Proper development requires that genes are expressed at the right time, in the right tissue, and at the right transcriptional level. In metazoans, this involves long-range cis-regulatory elements such as enhancers, which can be located up to hundreds of kilobases away from their target promoters. How enhancers find their target genes and avoid aberrant interactions with non-target genes is currently under intense investigations. The predominant model for enhancer function involves its direct physical looping between the enhancer and target promoter. The three-dimensional organization of chromatin, which accommodates promoter- enhancer interactions, therefore might play an important role in the specificity of these interactions. In the last decade, the development of a class of techniques called chromosome conformation capture (3C) and its derivatives have revolutionized the field of chromatin folding. In particular, the genome-wide version of 3C, Hi-C, revealed that mammalian chromosomes possess a rich hierarchy of folding layers, from multi-megabase compartments corresponding to mutually exclusive associations of active and inactive chromatin to topologically associating domains (TADs), which reflect regions with preferential internal interactions. Although the mechanisms that give rise to this hierarchy are still poorly understood, there is increasing evidence to suggest that TADs represent fundamental functional units for establishing the correct pattern of enhancer-promoter interactions. This is thought to occur through two complementary mechanisms: on the one hand, TADs are thought to increase the chances that regulatory elements meet each other by confining them within the same domain; on the other hand, by segregation of physical interactions across the boundary to avoid unwanted events to occur frequently. It is however unclear whether the properties that have been attributed to TADs are specific to TADs, or rather common features among the whole hierarchy. To address this question, I have implemented an algorithm named Caller of Topological Chromosomal Hierarchies (CaTCH). CaTCH is able to detect nested hierarchies of domains, allowing a comprehensive analysis of structural and functional properties across the folding hierarchy. By applying CaTCH to published Hi-C data in mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs), I showed that TADs emerge as a functionally privileged scale. In particular, TADs appear to be the scale where accumulation of CTCF at domain boundaries and transcriptional co-regulation during differentiation is maximal. Moreover, TADs appear to be the folding scale where the partitioning of interactions within transcriptionally active domains (and notably between active enhancers and promoters) is optimized. 3C-based methods have enabled fundamental discoveries such as the existence of TADs and CTCF-mediated chromatin loops. 3C methods detect chromatin interactions as ligation products after crosslinking the DNA. Crosslinking and ligation have been often criticized as potential sources of experimental biases, raising the question of whether TADs and CTCF- mediated chromatin loops actually exist in living cells. To address this, in collaboration with Josef Redolfi, we developed a new method termed ‘DamC’ which combines DNA methylation with physical modeling to detect chromosomal interactions in living cells, at the molecular scale, without relying on crosslinking and ligation. By applying DamC to mouse ESCs, we provide the first in vivo and crosslinking- and ligation-free validation of chromosomal structures detected by 3C-methods, namely TADs and CTCF-mediated chromatin loops. DamC, together with 3C-based methods, thus have shown that mammalian chromosomes possess a rich hierarchy of folding layers. An important challenge in the field is to understand the mechanisms that drive the establishment these folding layers. In this sense, polymer physics represent a powerful tool to gain mechanistic insights into the hierarchical folding of mammalian chromosomes. In polymer models, the scaling of contact probability, i.e. the contact probability as a function of genomic distance, has been often used to benchmark polymer simulations and test alternative models. However, the scaling of contact probability is only one of the many properties that characterize polymer models raising the question of whether it would be enough to discriminate alternative polymer models. To address this, I have built finite-size heteropolymer models characterized by random interactions. I showed that finite-size effects, together with the heterogeneity of the interactions, are sufficient to reproduce the observed range of scaling of contact probability. This suggests that one should be careful in discriminating polymer models of chromatin folding based solely on the scaling. In conclusion, my findings have contributed to achieve a better understanding of chromatin folding, which is essential to really understand how enhancers act on promoters. The comprehensive analyses using CaTCH have provided conceptually new insights into how the architectural functionality of TADs may be established. My work on heteropolymer models has highlighted the fact that one should be careful in using solely scaling to discriminate physical models for chromatin folding. Finally, the ability to detect TADs and chromatin loops using DamC represents a fundamental result since it provides the first orthogonal in vivo validation of chromosomal structures that had essentially relied on a single technology

    BIOINFORMATICS STRATEGIES FOR GENOMICS: EXAMPLES AND APPROACHES FOR TOMATO

    Get PDF
    My PhD is funded by the Solanaceae Pollen thermotolerance – Initial Training Network (SPOT-ITN) in the frame of the European Marie Curie Actions. The consortium aims to investigate fundamental and applied aspects contributing to the protection of pollen at increased environmental temperatures, deciphering the underlying of pollen development and its response to heat stress, starting from analyses on Tomato. Obviously, the findings are supposed to be a guideline, and the procedures to be applicable to other plants in the future. In the light of the SPOT-ITN project objectives, and to provide a comprehensive bioinformatics infrastructure to support extensive genomics analyses in tomato, we collected, processed and integrated different resources; and organized them into dedicated databases with appropriate query user interfaces. This bioinformatics effort required the design of the most adequate software to reconcile the manifold resources from different cell information levels (genomics, transcriptomics, epigenomics). This is fundamental for data integration and analysis. The development of appropriate tools to mine the data from the “omics” approaches employed to trace the pollen development and the heat stress response has also been necessary to the project. In this thesis, the main efforts undertaken and the analyses conducted on the basis of such resources with the strategies and approaches developed are reported in details

    WNT-DEPENDENT REGENERATIVE FUNCTION IS INDUCED IN LEUKEMIA-INITIATING AC133BRIGHT CELLS

    Get PDF
    The Cancer Stem Cell model supported the notion that leukemia was initiated and maintained in vivo by a small fraction of leukemia-initiating cells (LICs). Previous studies have suggested the involvement of Wnt signaling pathway in Acute Myeloid Leukemia (AML) by the ability to sustain the development of LICs. A novel hematopoietic stem and progenitor cell marker, monoclonal antibody AC133, recognizes the CD34bright CD38- subset of human acute myeloid leukemia cells, suggesting that it may be an early marker for the LICs. During the first part of my phD program we previously evaluated the ability of leukemic AC133+ fraction, to perform engraftment following to xenotransplantation in immunodeficient mouse model Rag2-/-\u3b3c-/-. The results showed that the surface marker AC133 is able to enrich for the cell fraction that contains the LICs. In consideration of our previously reported data, derived from the expression profiling analysis performed in normal (n=10) and leukemic (n=33) human long-term reconstituting AC133+ cells, we revealed that the ligand-dependent Wnt signaling is induced in AML through a diffuse expression and release of WNT10B, a hematopoietic stem cells regenerative-associated molecule. In situ detection performed on bone marrow biopsies of AML patients, showed the activation of the Wnt pathway, through the concomitant presence of the ligand WNT10B and of the active dephosphorylated \u3b2-catenin form, suggesting an autocrine / paracrine-type ligand-dependent activation mechanism. In consideration of the link between hematopoietic regeneration and developmental signaling, we transplanted primary AC133+ AML A46 cells into developing zebrafish. This biosensor model revealed the formation of ectopic structures by activation of dorsal organizer markers that act downstream of the Wnt pathway. These results suggested that the misappropriating Wnt associated functions can promote pathological stem cell-like regeneration responsiveness. The analyses performed in situ retained information on the cellular localization, enabling determination of the activity status of individual cells and allowing the tumor environment view. Taking this issue into consideration, during the second part of my phD program, I set up the application of a new in situ method for localized detection and genotyping of individual transcripts directly in cells and tissues. The mRNA in situ detection technique is based on padlock probes ligation and target priming rolling circle amplification allowing the single nucleotide resolution in heterogenous tissues. The mRNA in situ detection performed on bone marrow biopsies derived from AML patients, showed a diffuse localization pattern of WNT10B molecule in the tissue. Conversely, only the AC133bright cell population shows the Wnt signaling activation signature represented by the cytoplasmatic accumulation and nuclear translocation of the active form of \u3b2-catenin. In spite of this, we previously evidenced that the regenerative function of WNT signaling pathway is defined by the up-regulation of WNT10B, WNT10A, WNT2B and WNT6 loci, we identified the WNT10B as a major locus associated with the regenerative function and over-expressed by all AML patients. By the molecular evaluation of the WNT10B transcript, we isolated an aberrant splicing variant (WNT10BIVS1), that identify Non Core-Binding Factor Leukemia (NCBFL) class and whose potential role is discussed. Moreover, we demonstrate that the function of "leukemia stem cell", present in the cell population enriched for the marker AC133bright, is strictly related to regenerative function associated with WNT signaling, defining the key role of WNT10B ligand as a specific molecular marker for leuchemogenesis. This thesis defines the new suitable approaches to characterize the leukemia-initiating cells (LICs) and suggest the role of WNT10B as a new suitable target for AML

    HIGH-THROUGHPUT SEQUENCING CHARACTERIZATION OF DNA CYCLIZATION, WITH APPLICATIONS TO DNA LOOPING

    Get PDF
    DNA flexibility is important both for fundamental biophysics and because DNA flexibility affects DNA packaging and regulation of gene expression through DNA looping. DNA flexibility has been studied with experiments ranging from biochemical ring closure or DNA looping experiments to AFM, crystallography, and tethered particle microscopy. Even so, the flexibility of DNA in vitro and in vivo remains controversial. In an attempt to resolve this controversy, we have developed a high- throughput, internally controlled, comparative ligation methodology using a library constructed of 1023 distinct DNA sequences ranging in length from 119 to 219 base pairs via ligation of pools of synthetic DNA of different lengths and PCR. The design incorporated barcoding for redundant identification of each molecule, allowing for a ligation reaction to be performed on the entire library in one reaction mixture. Two DNA concentrations were used in separate reactions to promote either unimolecular cyclization or bimolecular ligation and thereby explore a wide range of cyclization efficiencies (J factors). Half of each reaction mixture was treated with BAL-31 to destroy non-cyclized molecules. All products were linearized by restriction digestion and Illumina indices were added. The initial library and reaction mixtures were sequenced in a single Illumina MiSeq run. From roughly 15 million assembled reads, over 13 million were identified using software written to identify and sort our sequence library. Each molecule was counted for each condition. From our analysis we see no evidence of extreme bendability at short DNA lengths. At higher DNA concentrations where bimolecular products are produced more rapidly, we see oscillatory behavior as a function of length. In contrast, at lower concentrations where unimolecular products dominate, we observe no helical variation due to the ability for all molecules to cyclize given enough time. In order to determine J factors through cyclization, bimolecular products must also be counted. Given the constraints of this experiment, not all bimolecular products could be observed. Future experimentation can be performed to determine J factors across this size range, the results of which will improve coarse grain modeling of DNA. Extension of this methodology should be applicable to DNA loops anchored by proteins

    Comprehensive Assessments of the Genetic Determinants in Salmonella Typhimurium for Fitness under Host Stressors: Oxidative Stress and Iron Restriction

    Get PDF
    Salmonella is an intracellular pathogen that infects a wide range of hosts. The infected host utilizes reactive oxygen species (ROS) and iron-restriction to eliminate the pathogen. We used proteogenomics to determine the candidate genes and proteins that have a role in resistance of S. Typhimurium to H2O2. For Tn-seq, a highly saturated Tn5 library was grown in vitro under either 2.5 (H2O2L) or 3.5 mM H2O2 (H2O2H). We identified two sets of overlapping genes that are required for resistance of S. Typhimurium to H2O2L and H2O2H, and the results were validated via phenotypic evaluation of 50 selected mutants. The enriched pathways for resistance to H2O2 included DNA repair, aromatic amino acid biosynthesis (aroBK), Fe-S cluster biosynthesis, iron homeostasis and a putative iron transporter system (ybbKLM), flagellar genes (fliBC), H2O2 scavenging enzymes, and DNA adenine methylase. Proteomics revealed that the majority of essential proteins, including ribosomal proteins, were downregulated upon exposure to H2O2. A subset of proteins identified by Tn-seq were analyzed by targeted proteomics, and 70 % of them were upregulated upon exposure to H2O2. Further, we assessed genomic of S. Typhimurium under gradient iron-restricted conditions using Tn-seq. In addition to conditionally essential genes that mediate the pathogen survival under iron-restricted conditions, we found ROS-dependent essential genes. Based on this, we expand ROS-antibiotic mediated killing model, which asserts that bactericidal antibiotics induce ROS formation and ultimately contributes to cell death. We show that impairment of many essential genes with transposons, without antibiotic interference, induce ROS formation and the death of these mutants can be ceased through an iron chelator. Tn-seq reveals that one-third of S. Typhimurium essential genome are ROS-dependent, far beyond antibiotic targets, as they can grow very slowly in iron-restricted conditions. Interestingly, majority of antibiotic target genes are ROS-dependent. We propose that ROS-independent essential genes may be better targets for antibiotic development because the cells die immediately following the disruption of the essential gene. This work expands our knowledge about mechanisms of S. Typhimurium survival in macrophages, the role of ROS in cell death following essential gene disruption, and provides novel targets for development of new antibiotics
    • 

    corecore