152 research outputs found

    A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables

    Full text link
    The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering

    Structural and functional multiplatform MRI series of a single human volunteer over more than fifteen years

    Get PDF
    We present MRI data from a single human volunteer consisting in over 599 multi-contrast MR images (T1-weighted, T2-weighted, proton density, fuid-attenuated inversion recovery, T2* gradient-echo, difusion, susceptibility-weighted, arterial-spin labelled, and resting state BOLD functional connectivity imaging) acquired in over 73 sessions on 36 diferent scanners (13 models, three manufacturers) over the course of 15+ years (cf. Data records). Data included planned data collection acquired within the Consortium pour l’identifcation précoce de la maladie Alzheimer - Québec (CIMA-Q) and Canadian Consortium on Neurodegeneration in Aging (CCNA) studies, as well as opportunistic data collection from various protocols. These multiple within- and between-centre scans over a substantial time course of a single, cognitively healthy volunteer can be useful to answer a number of methodological questions of interest to the community

    Physical mapping integrated with syntenic analysis to characterize the gene space of the long arm of wheat chromosome 1A

    Get PDF
    Background: Bread wheat (Triticum aestivum L.) is one of the most important crops worldwide and its production faces pressing challenges, the solution of which demands genome information. However, the large, highly repetitive hexaploid wheat genome has been considered intractable to standard sequencing approaches. Therefore the International Wheat Genome Sequencing Consortium (IWGSC) proposes to map and sequence the genome on a chromosome-by-chromosome basis. Methodology/Principal Findings: We have constructed a physical map of the long arm of bread wheat chromosome 1A using chromosome-specific BAC libraries by High Information Content Fingerprinting (HICF). Two alternative methods (FPC and LTC) were used to assemble the fingerprints into a high-resolution physical map of the chromosome arm. A total of 365 molecular markers were added to the map, in addition to 1122 putative unique transcripts that were identified by microarray hybridization. The final map consists of 1180 FPC based or 583 LTC based contigs. Conclusions/Significance: The physical map presented here marks an important step forward in mapping of hexaploid bread wheat. The map is orders of magnitude more detailed than previously available maps of this chromosome, and the assignment of over a thousand putative expressed gene sequences to specific map locations will greatly assist future functional studies. This map will be an essential tool for future sequencing of and positional cloning within chromosome 1A

    Construction and characterization of two BAC libraries representing a deep-coverage of the genome of chicory (Cichorium intybus L., Asteraceae)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Asteraceae represents an important plant family with respect to the numbers of species present in the wild and used by man. Nonetheless, genomic resources for Asteraceae species are relatively underdeveloped, hampering within species genetic studies as well as comparative genomics studies at the family level. So far, six BAC libraries have been described for the main crops of the family, <it>i.e</it>. lettuce and sunflower. Here we present the characterization of BAC libraries of chicory (<it>Cichorium intybus </it>L.) constructed from two genotypes differing in traits related to sexual and vegetative reproduction. Resolving the molecular mechanisms underlying traits controlling the reproductive system of chicory is a key determinant for hybrid development, and more generally will provide new insights into these traits, which are poorly investigated so far at the molecular level in Asteraceae.</p> <p>Findings</p> <p>Two bacterial artificial chromosome (BAC) libraries, CinS2S2 and CinS1S4, were constructed from <it>Hin</it>dIII-digested high molecular weight DNA of the contrasting genotypes C15 and C30.01, respectively. C15 was hermaphrodite, non-embryogenic, and <it>S</it><sub>2</sub><it>S</it><sub>2 </sub>for the <it>S</it>-locus implicated in self-incompatibility, whereas C30.01 was male sterile, embryogenic, and <it>S</it><sub>1</sub><it>S</it><sub>4</sub>. The CinS2S2 and CinS1S4 libraries contain 89,088 and 81,408 clones. Mean insert sizes of the CinS2S2 and CinS1S4 clones are 90 and 120 kb, respectively, and provide together a coverage of 12.3 haploid genome equivalents. Contamination with mitochondrial and chloroplast DNA sequences was evaluated with four mitochondrial and four chloroplast specific probes, and was estimated to be 0.024% and 1.00% for the CinS2S2 library, and 0.028% and 2.35% for the CinS1S4 library. Using two single copy genes putatively implicated in somatic embryogenesis, screening of both libraries resulted in detection of 12 and 13 positive clones for each gene, in accordance with expected numbers.</p> <p>Conclusions</p> <p>This indicated that both BAC libraries are valuable tools for molecular studies in chicory, one goal being the positional cloning of the <it>S</it>-locus in this Asteraceae species.</p

    Gene Duplication in the Sugarcane Genome: A Case Study of Allele Interactions and Evolutionary Patterns in Two Genic Regions

    Get PDF
    Sugarcane (Saccharum spp.) is highly polyploid and aneuploid. Modern cultivars are derived from hybridization between S. officinarum and S. spontaneum. This combination results in a genome exhibiting variable ploidy among different loci, a huge genome size (~10 Gb) and a high content of repetitive regions. An approach using genomic, transcriptomic, and genetic mapping can improve our knowledge of the behavior of genetics in sugarcane. The hypothetical HP600 and Centromere Protein C (CENP-C) genes from sugarcane were used to elucidate the allelic expression and genomic and genetic behaviors of this complex polyploid. The physically linked side-by-side genes HP600 and CENP-C were found in two different homeologous chromosome groups with ploidies of eight and ten. The first region (Region01) was a Sorghum bicolor ortholog region with all haplotypes of HP600 and CENP-C expressed, but HP600 exhibited an unbalanced haplotype expression. The second region (Region02) was a scrambled sugarcane sequence formed from different noncollinear genes containing partial duplications of HP600 and CENP-C (paralogs). This duplication resulted in a non-expressed HP600 pseudogene and a recombined fusion version of CENP-C and the orthologous gene Sobic.003G299500 with at least two chimeric gene haplotypes expressed. It was also determined that it occurred before Saccharum genus formation and after the separation of sorghum and sugarcane. A linkage map was constructed using markers from nonduplicated Region01 and for the duplication (Region01 and Region02). We compare the physical and linkage maps, demonstrating the possibility of mapping markers located in duplicated regions with markers in nonduplicated region. Our results contribute directly to the improvement of linkage mapping in complex polyploids and improve the integration of physical and genetic data for sugarcane breeding programs. Thus, we describe the complexity involved in sugarcane genetics and genomics and allelic dynamics, which can be useful for understanding complex polyploid genomes

    Contrasted Patterns of Molecular Evolution in Dominant and Recessive Self-Incompatibility Haplotypes in Arabidopsis

    Get PDF
    Self-incompatibility has been considered by geneticists a model system for reproductive biology and balancing selection, but our understanding of the genetic basis and evolution of this molecular lock-and-key system has remained limited by the extreme level of sequence divergence among haplotypes, resulting in a lack of appropriate genomic sequences. In this study, we report and analyze the full sequence of eleven distinct haplotypes of the self-incompatibility locus (S-locus) in two closely related Arabidopsis species, obtained from individual BAC libraries. We use this extensive dataset to highlight sharply contrasted patterns of molecular evolution of each of the two genes controlling self-incompatibility themselves, as well as of the genomic region surrounding them. We find strong collinearity of the flanking regions among haplotypes on each side of the S-locus together with high levels of sequence similarity. In contrast, the S-locus region itself shows spectacularly deep gene genealogies, high variability in size and gene organization, as well as complete absence of sequence similarity in intergenic sequences and striking accumulation of transposable elements. Of particular interest, we demonstrate that dominant and recessive S-haplotypes experience sharply contrasted patterns of molecular evolution. Indeed, dominant haplotypes exhibit larger size and a much higher density of transposable elements, being matched only by that in the centromere. Overall, these properties highlight that the S-locus presents many striking similarities with other regions involved in the determination of mating-types, such as sex chromosomes in animals or in plants, or the mating-type locus in fungi and green algae

    Supplementary File for Capturing wheat phenotypes at the genome level

    Get PDF
    Supplementary S1: Yield and related traits in bread wheat. Table S1: Examples of genomic regions, candidate and cloned genes for yield and related traits in bread wheat. Supplementary S2: Drought tolerance. Table S2: Examples of genomic regions and candidate genes for drought tolerance. Supplementary S3: Heat tolerance. Table S3. Examples of genomic regions and candidate genes for heat tolerance. Supplementary S4: salinity tolerance in bread wheat. Table S4. Examples of genomic regions and candidate genes for salinity tolerance in bread wheat. Supplementary S5: Frost tolerance. Supplementary S6: Disease resistance. Table S5. Examples of genomic regions, candidate and cloned genes mapped for disease resistance in wheat species. Supplementary S7 insect and mite resistance. Table S6. Examples of genomic regions and candidate genes mapped for insect and mite resistance. Supplementary S8: Quality traits. Table S7. Examples of genomic regions, candidate and cloned genes for quality traits.Recent technological advances in next-generation sequencing (NGS) technologies have dramatically reduced the cost of DNA sequencing, allowing species with large and complex genomes to be sequenced. Although bread wheat (Triticum aestivum L.) is one of the world’s most important food crops, efficient exploitation of molecular marker-assisted breeding approaches has lagged behind that achieved in other crop species, due to its large polyploid genome. However, an international public–private effort spanning 9 years reported over 65% draft genome of bread wheat in 2014, and finally, after more than a decade culminated in the release of a gold-standard, fully annotated reference wheat-genome assembly in 2018. Shortly thereafter, in 2020, the genome of assemblies of additional 15 global wheat accessions was released. As a result, wheat has now entered into the pan-genomic era, where basic resources can be efficiently exploited. Wheat genotyping with a few hundred markers has been replaced by genotyping arrays, capable of characterizing hundreds of wheat lines, using thousands of markers, providing fast, relatively inexpensive, and reliable data for exploitation in wheat breeding. These advances have opened up new opportunities for marker-assisted selection (MAS) and genomic selection (GS) in wheat. Herein, we review the advances and perspectives in wheat genetics and genomics, with a focus on key traits, including grain yield, yield-related traits, end-use quality, and resistance to biotic and abiotic stresses. We also focus on reported candidate genes cloned and linked to traits of interest. Furthermore, we report on the improvement in the aforementioned quantitative traits, through the use of (i) clustered regularly interspaced short-palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9)-mediated gene-editing and (ii) positional cloning methods, and of genomic selection. Finally, we examine the utilization of genomics for the next-generation wheat breeding, providing a practical example of using in silico bioinformatics tools that are based on the wheat reference-genome sequence.Peer reviewe
    corecore