62 research outputs found

    A method for genotyping elite breeding stocks of leaf chicory (Cichorium intybus L.) by assaying mapped microsatellite marker loci

    Get PDF
    BACKGROUND: Leaf chicory (Cichorium intybus subsp. intybus var. foliosum L.) is a diploid plant species (2n = 18) of the Asteraceae family. The term "chicory" specifies at least two types of cultivated plants: a leafy vegetable, which is highly differentiated with respect to several cultural types, and a root crop, whose current industrial utilization primarily addresses the extraction of inulin or the production of a coffee substitute. The populations grown are generally represented by local varieties (i.e., landraces) with high variation and adaptation to the natural and anthropological environment where they originated, and have been yearly selected and multiplied by farmers. Currently, molecular genetics and biotechnology are widely utilized in marker-assisted breeding programs in this species. In particular, molecular markers are becoming essential tools for developing parental lines with traits of interest and for assessing the specific combining ability of these lines to breed F1 hybrids. RESULTS: The present research deals with the implementation of an efficient method for genotyping elite breeding stocks developed from old landraces of leaf chicory, Radicchio of Chioggia, which are locally dominant in the Veneto region, using 27 microsatellite (SSR) marker loci scattered throughout the linkage groups. Information on the genetic diversity across molecular markers and plant accessions was successfully assessed along with descriptive statistics over all marker loci and inbred lines. Our overall data support an efficient method for assessing a multi-locus genotype of plant individuals and lineages that is useful for the selection of new varieties and the certification of local products derived from Radicchio of Chioggia. CONCLUSIONS: This method proved to be useful for assessing the observed degree of homozygosity of the inbred lines as a measure of their genetic stability; plus it allowed an estimate of the specific combining ability (SCA) between maternal and paternal inbred lines on the basis of their genetic diversity and the predicted degree of heterozygosity of their F1 hybrids. This information could be exploited for planning crosses and predicting plant vigor traits (i.e., heterosis) of experimental F1 hybrids on the basis of the genetic distance and allelic divergence between parental inbred lines. Knowing the parental genotypes would allow us not only to protect newly registered varieties but also to assess the genetic purity and identity of the seed stocks of commercial F1 hybrids, and to certificate the origin of their food derivatives

    Features Ranking Techniques for Single Nucleotide Polymorphism Data

    Get PDF
    Identifying biomarkers like single nucleotide polymorphisms (SNPs) is an important topic in biomedical applications. Such SNPs can be associated with an individual’s metabolism of drugs, which make these SNPs targets for drug therapy, and useful in personalized medicine applications. Yet another important application is that SNPs can be associated with an individual’s genetic predisposition to develop a disease. Identifying these associations allow proactive steps to be taken to hinder, delay or eliminate the disease. However, the problem is challenging; data are high dimensional and incomplete, and features (SNPs) are correlated. The goal of this thesis is to propose features ranking methods to reduce the number of selected features and the computational cost required to select these features in a binary classification task. The main idea of the hypothesis is that specific values within a feature might be useful in predicting specific classes, while other values are not. In this context, three heuristic methods are applied to select the best features. The methods are applied to the Wellcome Trust Case Control Consortium (WTCCC1) dataset, and evaluated on Texas A&M University Qatar’s High Performance Computing platform. The results show that the classification accuracy achieved by the proposed methods is comparable to the baseline. However, one of the proposed methods reduced the execution time of the feature selection and the number of features required to achieve similar accuracy in the baseline by 40% and 47% respectively

    LANDSCAPE ECOLOGY AND POPULATION GENOMICS OF TWO SYMPATRIC PITVIPER SPECIES ACROSS A FRAGMENTED APPALACHIAN LANDSCAPE

    Get PDF
    Understanding the link between landscape patterns and ecological and evolutionary processes is an important prerequisite for informed wildlife conservation and management, especially in rapidly changing landscapes. Until recently, the inaccessibility of spatial and genomic data sets of sufficient resolution limited our ability to incorporate the impacts of landscape patterns into predictions of ecological and environmental outcomes. In this dissertation, I utilized several high-resolution spatial and genomic data sets to address ecological questions in a rapidly fragmenting landscape in southeastern Kentucky. Overall, my results indicate that large-scale surface coal mining is causing widespread homogenization of landforms, resulting in a uniquely permanent form of habitat loss. This is likely causing significant fragmentation of remain forested habitat in many portions of the Cumberland Plateau of Kentucky, as evidenced by reductions in suitable overwintering habitat for the timber rattlesnake (Crotalus horridus). At the level of the individual, the high resolution and three-dimensional imagery provided by lidar remote sensing systems allows for a much more accurate assessment of the drivers of individual movement in C. horridus than using coarse topographic data sets alone. While this fragmentation might be expected to limit migration and increase genetic differentiation among population, patterns of genomic diversity in another common pit viper, the copperhead (Agkistrodon contortrix), suggest that contemporary surface mining is not associated with spatial patterns of genomic diversity. However, using a 2,140 SNP data set, I did find significant associations between a historic highway path and divergent genomic patterns, suggesting a time lag may be responsible for contemporary genomic patterns associated with a historic barrier to movement. When examining the landscape at broad spatial scales, the topographic rearrangement of land after mining followed steady patterns until approximately 2011. At this point, coinciding with federal policy shifts aimed at reducing the frequency of valley fill operations, mining impacts in stream bottoms decreased markedly, but ridgetops and upper slopes continued to be impacted at rates equal to or greater than before 2011. I recommend topographic restoration be highlighted as a worthy goal of reclamation, on par with vegetation establishment and erosion control

    Clustering, Classification, and Factor Analysis in High Dimensional Data Analysis

    Get PDF
    Clustering, classification, and factor analysis are three popular data mining techniques. In this dissertation, we investigate these methods in high dimensional data analysis. Since there are much more features than the sample sizes and most of the features are non-informative in high dimensional data, dimension reduction is necessary before clustering or classification can be made. In the first part of this dissertation, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai, 2009), and propose to use cross-validation to select the tuning parameter. Then we develop a variation of ODC, sparse optimal discriminant clustering (SODC) for high dimensional data, by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis. In the second part, three existing sparse principal component analysis (SPCA) methods, Lasso-PCA (L-PCA), Alternative Lasso PCA (AL-PCA), and sparse principal component analysis by choice of norm (SPCABP) are applied to a real data set the International HapMap Project for AIM selection to genome-wide SNP data, the classification accuracy is compared for them and it is demonstrated that SPCABP outperforms the other two SPCA methods. Third, we propose a novel method called sparse factor analysis by projection (SFABP) based on SPCABP, and propose to use cross-validation method for the selection of the tuning parameter and the number of factors. Our simulation studies show that SFABP has better performance than the unpenalyzed factor analysis when they are applied to classification problems

    Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches

    Get PDF
    The high degree of heterogeneity observed in breast cancers makes it very difficult to classify cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. In this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Identified somatic and non-synonymous single nucleotide variants were assigned a quantitative score (C-score) that represents the extent of negative impact on the function of the gene. Using these scores with a non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients among the three subgroups, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the C-scores (mutation scores) of these subgroups identified 358 genes that carry significantly higher rates of mutations in the late-stage-enriched subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late-state-enriched subgroup. Finally, using the identified subgroups, we also developed a supervised classification model to predict the likely stage of patients, given their mutation profiles, hence provide clinical insights to help devise an effective treatment plan. This study demonstrates that gene mutation profiles can be effectively used with machine-learning methods to identify clinically distinguishable subgroups of cancer patients. Genes and gene families that carry a heavy mutational load in late-stage-enriched cancer patients compared to early-stage-enriched subgroup were also identified from functional analysis of genes. The classification model developed in this method could provide a reasonable prediction of the stage of cancer patients solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology could also be applied to other cancer datasets

    Integrative genetic and network approaches to identify key regulators of cardiac fibrosis

    Get PDF
    Excessive fibrogenic response is a pathological hallmark of chronic complex diseases, including cardiovascular disease. To date, very few gene targets for cardiac fibrosis that led to effective treatments have been identified in humans. In this thesis I study and dissect the genetic component underlying cardiac fibrosis. This study integrates histomorphometric measurements of fibrosis in the rat left ventricle (LV) with gene expression (RNA-Seq from LV) and genetic data in a panel of recombinant inbred (RI) rat strains (n=30). In addition, I integrated RNA-seq LV and genetic data in humans (n=187, healthy and dilated cardiomyopathy (DCM) patients), as well as DCM genome-wide association studies (GWAS) data. I started by carrying out an unbiased co-expression network analysis in the rat heart. The reconstructed cardiac transcriptional modules were associated with quantitative levels of fibrosis. Co-expression networks were also independently built in the heart of DCM patients and by using the rat data, co-expression networks associated with fibrosis, conserved across rats and humans and not present in control human heart were prioritised. In the prioritised networks, I also analysed their cardiac cell type specificity, differential expression after TGFβ induction, potential driving transcription factors and conservation in other fibrotic diseases by analysing human data collected from other organs. Furthermore, I aimed to identify common genetic regulators of the networks (also called master genetic regulators) by using Bayesian multivariate regression approaches. Finally, I integrated GWAS data in DCM (n=2,287) to dissect the genetic basis of DCM. This systems genetics study evidences that there are transcriptional processes involved in the human cardiac fibrogenic response that are conserved across rats and humans, some of them also underlying DCM aetiology. In an attempt to suggest new gene targets for cardiac fibrosis, I also identified the WWP2 gene as a novel trans-acting genetic regulator of cardiac fibrosis.Open Acces

    Breeding F1 Hybrid Varieties of Leaf Chicory Through Marker-Assisted Selection Schemes

    Get PDF
    Cichorium (Cichorium intybus subsp. intybus var. foliosum L.) comprises diploid plant species (2n=18) belonging to the Asteraceae family. These species are biennial or, in the wild, perennial species. They are naturally allogamous due to an efficient sporophytic self-incompatibility system. In addition, outcrossing is promoted by a floral morpho-phenology unfavorable to selfing in the absence of pollen donors (i.e., proterandry, wherein the anthers mature before the pistils) and a favorable competition of allo-pollen grains and tubes (i.e., pollen that is genetically diverse from that produced by the seed parents, usually called auto-pollen). Long appreciated as medical plants by the ancient Greeks and Romans, Cichorium spp. are currently among the most important cultivated vegetable crops. They are generally used as components in fresh salads or, more rarely, cooked according to local traditions and alimentary habits. Although this crop does not contribute greatly to the total agricultural income of each country, it is very important at the local level, as it characterizes the agriculture of limited areas, where from 80 to 90% of the country’s production is concentrated. This is indeed the case of Italy, where the Veneto region accounts for 66% of the national acreage and 59% of the national production of the particular type of red or variegated chicory known as “Radicchio”. Radicchio production was for a long time based on farmer’s populations, which are yearly selected and maintained and whose seed is usually reutilized on farm but may also be sold through private and not officially registered transactions. All these populations, obtained by mass selection and maintained through the inter-crossing of selected parents, have to be considered highly heterozygous and genetically heterogeneous whose behaviour and level of adaptation to different environments and/or cultural conditions depend on the frequency of favourable genes or gene combinations. In each breeding program, selection schemes and methods that can be used and the varietal types than can be breed, depend on plant reproductive barriers (e.g. self-incompatibility) and pollination system (e.g. allogamous), and thus on the genetic structure of populations. As a matter of fact, the strong self-incompatibility system found in chicory hinders obtaining highly homozygous parents, made it generally difficult to propose an efficient F1 seed production scheme. Despite the difficulties encountered in obtaining inbred lines by repeated selfing, the recent discovery of spontaneous male-sterile mutants increased the interest towards the production of F1 hybrid varieties. Indeed, male-sterility, or the inability of plants to produce functional pollen, is needful to the commercial production of hybrid seed by crossing parental inbred lines appropriately selected through progeny tests for assessing their specific combining ability. In this project we developed a genotyping method using molecular markers, useful for assessing the homozygosity and genetic stability of single inbred lines and for measuring the specific combining ability between maternal and paternal inbred lines on the basis of their genetic diversity. This information could be exploited for planning crosses and predicting the heterosis of experimental F1 hybrids on the basis of the allelic divergence and genetic distance of the parental lines. Knowing the parental genotypes would enable not only to protect newly registered varieties but also to assess the genetic purity and identity of the seed stocks of commercial F1 hybrids, and to certificate the origin of their food derivatives. Modern marker-assisted breeding (MAB) technology based on traditional methods using molecular markers such as SSRs and SNPs, without relations to genetic modification (GM) techniques, will now be planned and adopted for breeding of vigorous and uniform F1 hybrids combining quality, uniformity, and productivity traits in the same genotypes. Furthermore, this research project deals with the discovery and genetic analysis of four male-sterile mutants in this species. These mutants, which to the best of our knowledge are the first spontaneous male-sterile mutants ever discovered and described in Radicchio, were characterized in great details for the developmental pathway of micro-sporogenesis and gametogenesis, and the inheritance pattern of the gene underlying the male-sterility trait. A quick molecular diagnostic assay was also developed for the early marker-assisted selection of the genotype associated to male-sterile plants. Hence, male-sterile mutants object of this PhD project were demonstrated to be controlled by a single nuclear gene (ms1) that acts at the recessive status. We were able to map the male-sterility gene on a well saturated and characterized linkage group in a chromosomal region spanning 7.3 cM and 5.8 cM from the ms1 locus. On the whole, this information was crucial to plan a Genotyping-by-Sequencing experiment based on BC1 progenies with the aim of narrowing down the genomic window containing the gene for male-sterility in leaf chicory. Finally, the sequencing and assembly of the first genome draft of leaf chicory, will contribute to increase and reinforce the reliability of Italian seed firms and local activities of the Veneto region associated with the cultivation and commercialization of Radicchio plant varieties and food products; the seed market of this species will have the chance to become highly professional and more competitive at the national and international levels. We assembled a genome draft of an estimated size of 760 Mb. We obtained 58,392,530 and 389,385,400 raw reads through the MySeq and HiSeq platforms, respectively. Overall, we identified 66,785 SSR containing regions. Original data from the bioinformatic assembly of the first genome draft of Radicchio, along with the most relevant findings that emerged from an extensive de novo gene prediction and in silico functional annotation of more than 18,000 unigenes are critically discussed. To uncover the sequence of a given genome means to gain a robust scientific background and technological knowhow, which in short time can play a crucial role in addressing and solving issues related to the cultivation and protection of modern Radicchio varieties. In fact, we are confident that our efforts will extend the current knowledge of the genome organization and gene composition of leaf chicories, which is crucial in the development of new tools and diagnostic markers useful for our breeding strategies, and allow researchers for more focused studies on chromosome regions controlling relevant agronomic traits of Radicchio. In conclusion, the present work is a sort of handbook to better understand the world of a non-model species, i.e. leaf chicory, and it is mainly directed to breeders and seed producers dealing with leaf chicory

    MOLECULAR RESOLUTION OF MARINE NEMATODES FOR IMPROVED ASSESSMENT OF BIODIVERSITY

    Get PDF
    Free-living nematodes are abundant in all marine habitats, highly diverse and can be important ecological indicators for monitoring anthropogenic impacts on the environment. Despite such attributes, nematode diagnostics has traditionally relied on detailed comparison of morphological characters which is often difficult and laborious, and as a result there is an increasing 'black hole' in faunal inventories where the biodiversity of groups such as nematodes is typically underestimated. Molecular methods offer a potentially efficient alternative approach to studying the biodiversity of marine nematode communities, and the main focus of this thesis was to apply molecular ecological tools for improved understanding of nematode diversity in marine and estuarine environments. Denaturing gradient gel electrophoresis (DGGE) has been evaluated as a novel tool for the identification of marine nematodes and for rapid assessment of their diversity based on amplification of the nuclear 18S rRNA gene. This approach successfully identified nematode taxa based on banding pattern and was also able to detect the most abundant taxa in samples from marine and estuarine environments. A DNA barcoding approach based on the 18S rRNA gene was applied for the first time in marine nematology, in an attempt to speed up the identification process. The success rate of this approach, across a range of nematode groups, was found to be close to 97%. A combined morphometrics and molecular approach was also undertaken to investigate cosmopolitanism and cryptic speciation by analysing populations of a cosmopolitan marine nematode, Terschellingia longicaudata, from different geographical regions. Results suggest that Terschellingia longicaudata is indeed truly cosmopolitan, with a wide geographic distribution. Two haplotypes that were divergent from most T. longicaudata were also identified in this study, indicating possible novel cryptic lineages or previously undescribed species of the genus. The final focus of this thesis was to develop methods for the molecular investigation of nematodes stored in formalin and other organic compounds. The effectiveness of formalin as a short term preservative was first evaluated, since this would allow morphological and molecular work to be conducted on the same specimen. Amplifiable DNA could be routinely obtained from specimens stored in formalin for periods of up to nine days. In addition the effectiveness of other organic solvents for the preservation of both molecular and morphological integrity of marine nematodes was investigated. The final part of this study developed and optimized a novel DNA extraction technique that could be employed to recover DNA from archived formalin fixed marine nematode specimens so as to carry out subsequent molecular analysis such as PCR amplification and sequencing.Plymouth Marine Laborator

    Analysis of large-scale molecular biological data using self-organizing maps

    Get PDF
    Modern high-throughput technologies such as microarrays, next generation sequencing and mass spectrometry provide huge amounts of data per measurement and challenge traditional analyses. New strategies of data processing, visualization and functional analysis are inevitable. This thesis presents an approach which applies a machine learning technique known as self organizing maps (SOMs). SOMs enable the parallel sample- and feature-centered view of molecular phenotypes combined with strong visualization and second-level analysis capabilities. We developed a comprehensive analysis and visualization pipeline based on SOMs. The unsupervised SOM mapping projects the initially high number of features, such as gene expression profiles, to meta-feature clusters of similar and hence potentially co-regulated single features. This reduction of dimension is attained by the re-weighting of primary information and does not entail a loss of primary information in contrast to simple filtering approaches. The meta-data provided by the SOM algorithm is visualized in terms of intuitive mosaic portraits. Sample-specific and common properties shared between samples emerge as a handful of localized spots in the portraits collecting groups of co-regulated and co-expressed meta-features. This characteristic color patterns reflect the data landscape of each sample and promote immediate identification of (meta-)features of interest. It will be demonstrated that SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities. Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data in two subsequent steps towards a handful of signature modules in an unsupervised fashion. Furthermore we demonstrate that analysis techniques provide enhanced resolution if applied to the meta-features. The improved discrimination power of meta-features in downstream analyses such as hierarchical clustering, independent component analysis or pairwise correlation analysis is ascribed to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features. Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Implementation of scoring measurements supplements the basal SOM algorithm. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle. Finally, case studies selected from different ‘OMIC’ realms are presented in this thesis. In particular, molecular phenotype data derived from expression microarrays (mRNA, miRNA), sequencing (DNA methylation, histone modification patterns) or mass spectrometry (proteome), and also genotype data (SNP-microarrays) is analyzed. It is shown that the SOM analysis pipeline implies strong application capabilities and covers a broad range of potential purposes ranging from time series and treatment-vs.-control experiments to discrimination of samples according to genotypic, phenotypic or taxonomic classifications

    Profiling gene expression in the brain for insight into neurological disease

    Get PDF
    Autism Spectrum Disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one approach that is gaining popularity is the combination of gene expression and clinical genetic data, often using the SFARI Gene database, which comprises lists of curated genes considered to have causative roles in ASD when mutated in patients. We built a gene co-expression network to study the relationship between ASD-specific transcriptomic data and SFARI genes and analysed it at different levels of granularity, first as individual genes, then as clusters of genes, and finally as the complete co-expression network. No significant evidence was found of association between SFARI genes and differential gene expression patterns when comparing ASD samples to a control group, nor statistical enrichment of SFARI genes in gene co-expression network clusters that have a strong correlation with ASD diagnosis. However, classification models that incorporate topological information from the whole ASD-specific gene co-expression network can predict novel SFARI candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We demonstrate that only co-expression network analyses that integrate information from the whole network are able to reveal signatures linked to ASD diagnosis. These analyses successfully identify novel candidate genes associated with ASD where individual gene or cluster analyses fail. We also find a statistically significant association between the level of expression of SFARI genes and their SFARI Gene Score which confounds downstream analysis of ASD gene expression data. We present a novel approach to correct for this that is generalisable to other situations where analysis is affected by continuous sources of bias
    corecore