540 research outputs found

    Structural Variation in the Maize Genome

    Get PDF
    Whole-genome array-based comparative genomic hybridization (aCGH) was used to study structural variation between two elite maize inbred lines, B73 and Mo17. Several hundred Copy Number Variants (CNVs) as well as several thousands of Present Absent Variants (PAVs) were discovered. This high level of structural variation among haplotypes is unprecedented among higher eukaryotes. Haplotype-specific PAVs that encompass hundreds of single-copy, expressed genes may contribute to heterosis and the extraordinary phenotypic diversity of this important crop. aCGH can be also used for genotyping complex genomes, such as that of maize. Approximately 200,000 oligonucleotide probes whose hybridization signals exhibit significant differences between B73 and Mo17 were used to genotype two Recombinant Inbred Lines (RILs) derived from a cross between these two inbreds. The resulting genotyping scores are highly consistent with marker data from previous experiments generated using alternative technologies. A careful analysis of the aCGH data from the two RILs relative to their inbred parents revealed the presence of several hundred apparently de novo CNVs. Further analyses revealed that these recurrent apparently de novo CNVs were caused by the segregation of single-copy homologous sequences that are located in non-allelic positions in the two parental inbreds. These changes in genome content of RILs were validated via both PCR and whole genome shotgun sequencing experiments

    Assembly and Compositional Analysis of Human Genomic DNA - Doctoral Dissertation, August 2002

    Get PDF
    In 1990, the United States Human Genome Project was initiated as a fifteen-year endeavor to sequence the approximately three billion bases making up the human genome (Vaughan, 1996).As of December 31, 2001, the public sequencing efforts have sequenced a total of 2.01 billion finished bases representing 63.0% of the human genome (http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&&ORG=Hs) to a Bermuda quality error rate of 1/10000 (Smith and Carrano, 1996). In addition, 1.11 billion bases representing 34.8% of the human genome has been sequenced to a rough-draft level. Efforts such as UCSC\u27s GoldenPath (Kent and Haussler, 2001) and NCBI\u27s contig assembly (Jang et al., 1999) attempt to assemble the human genome by incorporating both finished and rough-draft sequence. The availability of the human genome data allows us to ask questions concerning the maintenance of specific regions of the human genome. We consider two hypotheses for maintenance of high G+C regions: the presence of specific repetitive elements and compositional mutation biases. Our results rule out the possibility of the G+C content of repetitive elements determining regions of high and low G+C regions in the human genome. We determine that there is a compositional bias for mutation rates. However, these biases are not responsible for the maintenance of high G+C regions. In addition, we show that regions of the human under less selective pressure will mutate towards a higher A+T composition, regardless of the surrounding G+C composition. We also analyze sequence organization and show that previous studies of isochore regions (Bernardi,1993) cannot be generalized within the human genome. In addition, we propose a method to assemble only those parts of the human genome that are finished into larger contigs. Analysis of the contigs can lead to the mining of meaningful biological data that can give insights into genetic variation and evolution. I suggest a method to help aid in single nucleotide polymorphism (SNP)detection, which can help to determine differences within a population. I also discuss a dynamic-programming based approach to sequence assembly validation and detection of large-scale polymorphisms within a population that is made possible through the availability of large human sequence contigs

    Evolution of regulatory complexes: a many-body system

    Get PDF
    The recent advent of large-scale genomic sequence data and improvement of sequencing technologies has enabled population genetics to advance from a mostly abstract theoretical basis to a quantitative molecular description. However, functional units in DNA are typically combinations of interacting nucleotide segments, and evolutionary forces acting on these segments can result in very complicated population dynamics. The goal is to formulate these interactions in such a way that the macroscopic features are independent of the microscopic details, as in statistical mechanics. In this thesis, I discuss the evolutionary dynamics of regulatory sequences, which control the production of protein in cells. One of the primary forms of regulation occurs through interactions of proteins called transcription factors, with binding sites in the DNA sequence, and the strength of these interactions influence the individual's fitness in the population. What makes this an ideal model system for quantitative analysis of genomic evolution, is the possibility of inferring this relationship. Compared to prokaryotes and yeast, gene regulation is much more complex in higher eukaryotes. Regulatory information is organized in modules with multiple binding sites that are linked to a common function. In Chapter. 2, we show that binding site complexes are commonly formed by local sequence duplications, as opposed to forming from scratch by single point mutations. We also show that the underlying regulatory grammar is in tune with this mechanism such that the duplication events confer an adaptive advantage. Regulatory complexes resemble a many-particle system whose function emerges from the collective dynamics of its elements. In Chapter. 3, we develop a thermodynamic framework to characterize the effective affinity of site complexes to multiple transcription factors with cooperative binding. These affinities are the phenotype, or trait of binding complexes on which selection acts, and we characterize their evolution. From the yeast genome polymorphism data, we infer a fitness landscape as a function of binding affinity by using the novel method developed in Chapter.~ 4. This method of quantitative trait analysis can deal with long-range correlations between sites which arise in asexual populations. Our fitness landscape quantitatively predicts the amount of conservation of the phenotype, as well as the amount of compensatory changes between sites. Our results open a new avenue to understand the regulatory "grammar" of eukaryotic genomes based on quantitative evolution models. They prove that a combination of theoretical models, high-throughput experimental measurements, and analysis of genomic variation is necessary for a proper quantitative understanding of biological systems

    Quantitative Approaches to Understanding Cancer Genomes.

    Full text link
    Recent advances in technology have enabled the systematic, genome-wide analysis of cancer genomes, providing greater insight into the genetic basis of cancer development and a deeper understanding of the human genome. The focus of the current work is to identify genomic alterations potentially conferring risk for developing colorectal and breast cancers by performing genome-wide analysis with single nucleotide polymorphism (SNP) genotyping and next-generation sequencing (NGS) platforms. My first dissertation project involves deeply sequencing the genomes of individuals from a single family to identify novel mutations in hereditary mixed polyposis syndrome, a rare form of colorectal cancer with no known genetic basis. A novel candidate gene, ZNF426, was identified and decreased expression was confirmed in tumors from affected individuals. The second part of my dissertation evaluates methods for detection of somatic copy number alterations in colorectal cancer on chromosome 18 and the application of statistical methods for utilizing poor quality tumor data. Using genotyping and expression data from tumors, a variety of structural alterations were identified on chromosome 18. Additionally, I demonstrated the utility of applying new statistical methods to identity copy number alterations in array data with high background noise. The goal of my third project was to evaluate the contribution of consanguinity to breast cancer risk in Arab women without mutations in the BRCA1 and BRCA2 genes. The hypothesis in this study is that an increase in autosomal recessive genes responsible for genetic susceptibility to breast cancer is expected among families with consanguinity due to the increase in probability of sharing alleles identical-by-descent. Six unrelated individuals with breast cancer shared a 200kb overlapping region of homozygous SNPs on chromosome 9q332-33.3, which harbors an important candidate gene for cancer risk, LHX2. Whole-genome analysis allows for greater depth and higher throughput sequencing at lower costs, adding a new dimension to our understanding of cancer genetics. Future progress in these technologies and bioinformatics methods will improve the costs, sensitivity and accuracy of detecting mutations.Ph.D.Human GeneticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89716/1/gornickm_1.pd

    Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content

    Get PDF
    Following the domestication of maize over the past ∌10,000 years, breeders have exploited the extensive genetic diversity of this species to mold its phenotype to meet human needs. The extent of structural variation, including copy number variation (CNV) and presence/absence variation (PAV), which are thought to contribute to the extraordinary phenotypic diversity and plasticity of this important crop, have not been elucidated. Whole-genome, array-based, comparative genomic hybridization (CGH) revealed a level of structural diversity between the inbred lines B73 and Mo17 that is unprecedented among higher eukaryotes. A detailed analysis of altered segments of DNA conservatively estimates that there are several hundred CNV sequences among the two genotypes, as well as several thousand PAV sequences that are present in B73 but not Mo17. Haplotype-specific PAVs contain hundreds of single-copy, expressed genes that may contribute to heterosis and to the extraordinary phenotypic diversity of this important crop

    Beyond the fingerprints: From biometric to genetics

    Get PDF
    Aside to the demographic screening, a deeper biosocial interest in India can be observed on the scale of groups and subpopulations. Several agencies (university consortia, departments of human forensic genetics), are pursuing the inspection of population bio-history, and genotyping. Most of the results concern the genetic structure, and admixtures, in a phylogenetic net connecting clades and sub-clades. Two large ancestral stocks are supposed at the origin of the demographic mosaic. The first of these, Ancestral North Indians (ANI) had its centre in a western Euro-Asian area, and the Middle East. The second, called ASI, Ancestral South Indians, centred in the Andaman Islands, but prevalent in South India. Under this perspective, the authenticity is associated, with autochthony: the “true” Indians are those who first populated the territory. Thus, adivasis label (aboriginals) designates the “originals”. Their roots, both on the bio-genetic and cultural level, belong to the deepest layer of the variegated Pan-Indian scenario. This simplified version coexists with a divergent theory linked to modernized frames of the classic hierarchical background. In a seminal study M. Bamshad showed as the social pyramid corresponded to a distribution the Y chromosome heritage. The highest rate of markers of haplogroup R1a1 was found among the top castes, and lowest in the Shudras and outcastes. The supporters of Hindu supremacy wear the R1a1 brand as a symbol of identity that confirms the Vedic myth in which society is depicted as a body (where the limbs represent the different classes), supporting a renewed image of national solidarity.A fianco allo screening demografico condotto nel programma Aadhaar in India, un interesse di ricerca piĂč profondo puĂČ essere osservato sul piano bio-genetico. Agenzie governative e consorzi universitari conducono da tempo studi bio-storici sulla rete filogenetica di classificazione dei gruppi etnici e linguisitici. Due grandi ceppi (ANI, Ancestral North Indians, e ASI, Ancestral South Indians) sarebbero all’origine dell’intero mosaico etnico indiano. In questa immagine, l’autoctocnia e l’autenticitĂ  sono strettamente connesse. Gli Adivasi rappresentano lo strato originario, in base a criteri sia culturali ed etnici, sia bio-genetici. Ma una teoria divergente coesiste con questa, una teoria che indica nella somiglianza stretta fra i profili genetici delle caste alte indiane e delle popolazioni europee il tratto decisivo della autentica identitĂ  Hindu. L’aplotipo R1a1, dal cromosoma Y, diventa l’insegna biologica della identitĂ  suprema, tipicamente brahmanica, e insieme, il nucleo eminente dell’identitĂ  nazionale

    Analysis and Annotation of Nucleic Acid Sequence

    Full text link

    Developmental constraints, innovations and robustness

    Get PDF
    During my PhD, I have been working on Evo-Devo patterns (especially the debate around the hourglass model) in transcriptomes, with an emphasis on adaptation. I have characterized patterns in model organisms in terms of constraints and especially in terms of positive selection. I found that the phylotypic stage (a stage in mid-embryonic development) is an evolutionary lockdown, with stronger purifying selection and less positive selection than other stages in terms of the evolution of protein sequences and of regulatory elements. To study the adaptive evolution of gene regulation during development, I have developed a machine leaning based in silico mutagenesis approach to detect positive selection on regulatory elements. In addition to transcriptome evolution, I have been working on the tension between precision and stochasticity of gene expression during development. More precisely, I have shown that expression noise follows an hourglass pattern, with lower noise at the phylotypic stage. This pattern can be explained by stronger histone modification mediated noise control at this stage. In addition, I propose that histone modifications contribute to mutational robustness in regulatory elements, and thus to conserved expression levels. These results provide insight into the role of robustness in the phenotypic and genetic patterns of evolutionary conservation in animal developmen

    Comparison of visualization methods of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia

    Full text link
    Data mining and knowledge discovery have been applied to datasets in various industries including biomedical data. Modelling, data mining and visualization in biomedical data address the problem of extracting knowledge from large and complex biomedical data. The current challenge of dealing with such data is to develop statistical-based and data mining methods that search and browse the underlying patterns within the data. In this paper, we employ several data reduction methods for visualizing genome- wide Single Nucleotide Polymorphism (SNP) datasets based on state-of-art data reduction techniques. Visualization approach has been selected based on the trustworthiness of the resultant visualizations. To deal with large amounts of genetic variation data, we have chosen to apply different data reduction methods to deal with the problem induced by high dimensionality. Based on the trustworthiness metric we found that neighbour Retrieval Visualizer (NeRV) outperformed other methods. This method optimizes the retrieval quality of Stochastic neighbour Embedding. The quality measure of the visualization (i.e. NeRV) showed excellent results, even though the dataset was reduced from 13917 to 2 dimensions. The visualization results will assist clinicians and biomedical researchers in understanding the systems biology of patients and how to compare different groups of clusters in visualizations. © 2008, Australian Computer Society, Inc

    Genome-wide Association And High-resolution Phenotyping Link Oryza Sativa Panicle Traits To Numerous Trait-specific Qtl Clusters

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Rice panicle architecture is a key target of selection when breeding for yield and grain quality. However, panicle phenotypes are difficult to measure and susceptible to confounding during genetic mapping due to correlation with flowering and subpopulation structure. Here we quantify 49 panicle phenotypes in 242 tropical rice accessions with the imaging platform PANorama. Using flowering as a covariate, we conduct a genome-wide association study (GWAS), detect numerous subpopulation-specific associations, and dissect multi-trait peaks using panicle phenotype covariates. Ten candidate genes in pathways known to regulate plant architecture fall under GWAS peaks, half of which overlap with quantitative trait loci identified in an experimental population. This is the first study to assess inflorescence phenotypes of field-grown material using a high-resolution phenotyping platform. Herein, we establish a panicle morphocline for domesticated rice, propose a genetic model underlying complex panicle traits, and demonstrate subtle links between panicle size and yield performance.7Bill and Melinda Gates FoundationFAPESP [2011/03110-6]NSF Plant Genome Research Program [1026555]NSF Graduate Research Fellowship Program (NSF-GRFP)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP
    • 

    corecore