23 research outputs found

    Using signal processing, evolutionary computation, and machine learning to identify transposable elements in genomes

    Get PDF
    About half of the human genome consists of transposable elements (TE's), sequences that have many copies of themselves distributed throughout the genome. All genomes, from bacterial to human, contain TE's. TE's affect genome function by either creating proteins directly or affecting genome regulation. They serve as molecular fossils, giving clues to the evolutionary history of the organism. TE's are often challenging to identify because they are fragmentary or heavily mutated. In this thesis, novel features for the detection and study of TE's are developed. These features are of two types. The first type are statistical features based on the Fourier transform used to assess reading frame use. These features measure how different the reading frame use is from that of a random sequence, which reading frames the sequence is using, and the proportion of use of the active reading frames. The second type of feature, called side effect machine (SEM) features, are generated by finite state machines augmented with counters that track the number of times the state is visited. These counters then become features of the sequence. The number of possible SEM features is super-exponential in the number of states. New methods for selecting useful feature subsets that incorporate a genetic algorithm and a novel clustering method are introduced. The features produced reveal structural characteristics of the sequences of potential interest to biologists. A detailed analysis of the genetic algorithm, its fitness functions, and its fitness landscapes is performed. The features are used, together with features used in existing exon finding algorithms, to build classifiers that distinguish TE's from other genomic sequences in humans, fruit flies, and ciliates. The classifiers achieve high accuracy (> 85%) on a variety of TE classification problems. The classifiers are used to scan large genomes for TE's. In addition, the features are used to describe the TE's in the newly sequenced ciliate, Tetrahymena thermophile to provide information for biologists useful to them in forming hypotheses to test experimentally concerning the role of these TE's and the mechanisms that govern them

    Genome reconstruction and combinatoric analyses of rearrangement evolution

    Get PDF
    Cancer is often associated with a high number of large-scale, structural rearrangements. In a highly selective environment, some `driver' mutations conferring clonal growth advantage will be positively selected, accounting for further cancer development. Clarifying their nature, as well as their contribution to the pathology is a major current focus of biomedical research. Next generation sequencing technologies can be used nowadays to generate high-resolution data-sets of these alterations in cancer genomes. This project has been developed along two main lines: 1) the reconstruction of cancer aberrant karyotypes, together with their underlying evolutionary history; 2) the elucidation of some combinatorial properties associated with gene duplications. We applied graph theory to the problem of reconstructing the final cancer genome sequence; additionally, we developed an algorithmic approach for the reconstruction of a multi-step evolution consistent with read coverage and paired end data, giving insights on the possible molecular mechanisms underlying rearrangements. Looking at the combinatorics of both tandem and inverted duplication, we developed an algebraic formalism for the representation of these processes. This allowed us to both explore the geometric properties of sequences arising by Tandem Duplication (TD), and obtain a recursion for the number of tandem duplications evolutions after n events. Such results are missing for inverted duplications, whose combinatorial properties have been nevertheless deeply elucidated. Our results have allowed: 1) the identification, through an original approach, of potential rearrangement mechanisms associated with cancer development, and 2) the definition and mathematical description of the complete evolutionary space of specific rearrangement classes

    Wheat Improvement

    Get PDF
    This open-access textbook provides a comprehensive, up-to-date guide for students and practitioners wishing to access in a single volume the key disciplines and principles of wheat breeding. Wheat is a cornerstone of food security: it is the most widely grown of any crop and provides 20% of all human calories and protein. The authorship of this book includes world class researchers and breeders whose expertise spans cutting-edge academic science all the way to impacts in farmers’ fields. The book’s themes and authors were selected to provide a didactic work that considers the background to wheat improvement, current mainstream breeding approaches, and translational research and avant garde technologies that enable new breakthroughs in science to impact productivity. While the volume provides an overview for professionals interested in wheat, many of the ideas and methods presented are equally relevant to small grain cereals and crop improvement in general. The book is affordable, and because it is open access, can be readily shared and translated -- in whole or in part -- to university classes, members of breeding teams (from directors to technicians), conference participants, extension agents and farmers. Given the challenges currently faced by academia, industry and national wheat programs to produce higher crop yields --- often with less inputs and under increasingly harsher climates -- this volume is a timely addition to their toolkit

    Linkage mapping in tetraploid blackberry (Rubus spp.) using high-throughput genomic sequencing and restriction-site associated DNA sequencing (RAD-seq)

    Get PDF
    Genomic molecular marker research and linkage mapping in cultivated polyploid species of Rubus, in particular blackberry, has lagged behind diploid members of the genus due to the complex analyses required to deduce allele dosage among homologous and homeologous chromosomes. Traditional techniques for studying linkage in tetraploid species, including Rubus, have made use of limited numbers of scorable markers – such as expressed sequence tag simple sequence repeats (EST-SSRs) or amplified fragment length polymorphisms (ALFPs) – and large mapping populations essential for the observation of recombination events among dominant markers. More recently, high-throughput genomic sequencing has enabled the discovery of an abundance of co-dominant, single nucleotide polymorphic (SNP) markers, at orders of magnitude greater than previous marker discovery methods. The purposes of this research are 1) to derive a linkage map of single nucleotide polymorphisms (SNPs) in tetraploid blackberry (Rubus spp.) using the reduced-representation, restriction-site associated genomic DNA sequencing method; 2) to discover quantitative trait loci (QTL) associated with traits of economic importance to blackberry breeders and producers; and 3) to characterize the structure and diversity within the blackberry genome, especially through comparisons against its diploid cousin Rubus idaeus, the red raspberry. The techniques developed in the present study extend the utility of restriction site associated DNA sequencing (RAD-Seq) to tetraploid species, and will be of interest to others working with polyploid species. This research generated high-density maps for two blackberry cultivars, ‘Chester Thornless’ and ‘Prime-Jim’ (‘APF-12’), consisting of 2,118 markers in 29 linkage groups spanning a total map distance of 1,059.1 cM and 1,759 markers in 31 linkage groups spanning 1,024.6 cM, respectively. The primocane-fruiting locus F was determined to be heterozygous duplex in ‘Chester Thornless’, and supporting evidence for the positioning of the thornless locus S at one end of the fourth linkage group was uncovered. Furthermore, the within-genome heterozygosity was determined to be 1.915% for ‘Chester Thornless’ and 1.749% for ‘Prime-Jim’. Double reduction was observed in the data to occur at a minimum rate of 7.94% across all heterozygous loci. This value enabled the calculation of observation-based estimates for the inbreeding coefficients for both cultivars: 0.04554 for ‘Chester Thornless’ and 0.04235 for ‘Prime-Jim’ at a predicted maximum

    2019 GREAT Day Program

    Get PDF
    SUNY Geneseo’s Thirteenth Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1013/thumbnail.jp

    Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth

    Get PDF
    Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page Abstract Acknowledgements Index 1. The structure of this thesis 1.1. Structure of this PhD dissertation 1.2. Publications of this PhD dissertation 1.3. Computational infrastructure and resources 1.4. Disclosure of financial support and information use 1.5. Acknowledgements 1.6. Author contributions and use of impersonal and personal pronouns 2. Biological background 2.1. The complexity of the eukaryotic genome 2.2. The problem of counting and defining “genes” in eukaryotes 2.3. The “function” concept for genes and “dark matter” 2.4. Increases of organismal complexity on Earth through multicellularity 2.5. Multicellularity is a “fitness transition” in individuality 2.6. The complexity of cell differentiation in multicellularity 3. Technical background 3.1. The Phylogenetic Comparative Method (PCM) 3.2. RNA secondary structure prediction 3.3. Some standards for genome and gene annotation 4. What is in a eukaryotic genome? GenomeContent provides a good answer 4.1. Background 4.2. Motivation: an interoperable tool for data retrieval of gene annotations 4.3. Methods 4.4. Results 4.5. Discussion 5. The evolutionary correlation between genome size and ncDNA 5.1. Background 5.2. Motivation: estimating the relationship between genome size and ncDNA 5.3. Methods 5.4. Results 5.5. Discussion 6. The relationship between non-coding DNA and Complex Multicellularity 6.1. Background 6.2. Motivation: How to define and measure complex multicellularity across eukaryotes? 6.3. Methods 6.4. Results 6.5. Discussion 7. The ceRNA motif pipeline: regulation of microRNAs by target mimics 7.1. Background 7.2. A revisited protocol for the computational analysis of Target Mimics 7.3. Motivation: a novel pipeline for ceRNA motif discovery 7.4. Methods 7.5. Results 7.6. Discussion 8. Conclusions and outlook 8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses 8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya 8.3. “Complex multicellularity” is a major evolutionary transition 8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth 9. Supplementary Data Bibliography Curriculum Scientiae SelbstĂ€ndigkeitserklĂ€rung (declaration of authorship

    2018 GREAT Day Program

    Get PDF
    SUNY Geneseo’s Twelfth Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1012/thumbnail.jp

    A complex systems approach to education in Switzerland

    Get PDF
    The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance

    Genomic Selection for Crop Improvement: New Molecular Breeding Strategies for Crop Improvement

    Get PDF
    Genomic Selection for Crop Improvement serves as handbook for users by providing basic as well as advanced understandings of genomic selection. This useful review explains germplasm use, phenotyping evaluation, marker genotyping methods, and statistical models involved in genomic selection. It also includes examples of ongoing activities of genomic selection for crop improvement and efforts initiated to deploy the genomic selection in some important crops. In order to understand the potential of GS breeding, it is high time to bring complete information in the form of a book that can serve as a ready reference for geneticist and plant breeders

    Intrinsic Hardware Evolution on the Transistor Level

    Get PDF
    This thesis presents a novel approach to the automated synthesis of analog circuits. Evolutionary algorithms are used in conjunction with a fitness evaluation on a dedicated ASIC that serves as the analog substrate for the newly bred candidate solutions. The advantage of evaluating the candidate circuits directly in hardware is twofold. First, it may speed up the evolutionary algorithms, because hardware tests can usually be performed faster than simulations. Second, the evolved circuits are guaranteed to work on a real piece of silicon. The proposed approach is realized as a hardware evolution system consisting of an IBM compatible general purpose computer that hosts the evolutionary algorithm, an FPGA-based mixed signal test board, and the analog substrate. The latter one is designed as a Field Programmable Transistor Array (FPTA) whose programmable transistor cells can be almost freely connected. The transistor cells can be configured to adopt one out of 75 different channel geometries. The chip was produced in a 0.6”m CMOS process and provides ample means for the input and output of analog signals. The configuration is stored in SRAM cells embedded in the programmable transistor cells. The hardware evolution system is used for numerous evolution experiments targeted at a wide variety of different circuit functionalities. These comprise logic gates, Gaussian function circuits, D/A converters, low- and highpass filters, tone discriminators, and comparators. The experimental results are thoroughly analyzed and discussed with respect to related work
    corecore