44 research outputs found

    Pareto optimal-based feature selection framework for biomarker identification

    Get PDF
    Numerous computational techniques have been applied to identify the vital features of gene expression datasets in aiming to increase the efficiency of biomedical applications. The classification of microarray data samples is an important task to correctly recognise diseases by identifying small but clinically meaningful genes. However, identification of disease representative genes or biomarkers in high dimensional microarray gene-expression datasets remains a challenging task. This thesis investigates the viability of Pareto optimisation in identifying relevant subsets of biomarkers in high-dimensional microarray datasets. A robust Pareto Optimal based feature selection framework for biomarker discovery is then proposed. First, a two-stage feature selection approach using ensemble filter methods and Pareto Optimality is proposed. The integration of the multi-objective approach employing Pareto Optimality starts with well-known filter methods applied to various microarray gene-expression datasets. Although filter methods provide ranked lists of features, they do not give information about optimum subsets of features, which are namely genes in this study. To address this limitation, the Pareto Optimality is incorporated along with filter methods. The robustness of the proposed framework is successfully demonstrated on several well-known microarray gene expression datasets and it is shown to achieve comparable or up to 100% predictive accuracy with comparatively fewer features. Better performance results are obtained in comparison with other approaches, which are single-objective approaches. Furthermore, cross-validation and k-fold approaches are integrated into the framework, which can enhance the over-fitting problem and the gene selection process is subsequently more accurate under various conditions. Then the proposed framework is developed in several phases. The Sequential Forward Selection method (SFS) is first used to represent wrapper techniques, and the developed Pareto Optimality based framework is applied multiple times and tested on different data types. Given the nature of most real-life data, imbalanced classes are examined using the proposed framework. The classifier achieves high performance at a similar level of different cases using the proposed Pareto Optimal based feature selection framework, which has a novel structure for imbalanced classes. Comparable or better gene subset sizes are obtained using the proposed framework. Finally, handling missing data within the proposed framework is investigated and it is demonstrated that different data imputation methods can also help in the effective integration of various feature selection methods

    Phylogenomics of Gesneriaceae using targeted capture of nuclear genes.

    Get PDF
    Gesneriaceae (ca. 3400 species) is a pantropical plant family with a wide range of growth form and floral morphology that are associated with repeated adaptations to different environments and pollinators. Although Gesneriaceae systematics has been largely improved by the use of Sanger sequencing data, our understanding of the evolutionary history of the group is still far from complete due to the limited number of informative characters provided by this type of data. To overcome this limitation, we developed here a Gesneriaceae-specific gene capture kit targeting 830 single-copy loci (776,754 bp in total), including 279 genes from the Universal Angiosperms-353 kit. With an average of 557,600 reads and 87.8% gene recovery, our target capture was successful across the family Gesneriaceae and also in other families of Lamiales. From our bait set, we selected the most informative 418 loci to resolve phylogenetic relationships across the entire Gesneriaceae family using maximum likelihood and coalescent-based methods. Upon testing the phylogenetic performance of our baits on 78 taxa representing 20 out of 24 subtribes within the family, we showed that our data provided high support for the phylogenetic relationships among the major lineages, and were able to provide high resolution within more recent radiations. Overall, the molecular resources we developed here open new perspectives for the study of Gesneriaceae phylogeny at different taxonomical levels and the identification of the factors underlying the diversification of this plant group

    Pod shattering: A homologous series of variation underlying domestication and an avenue for crop improvement

    Get PDF
    All rights reserved. In wild habitats, fruit dehiscence is a critical strategy for seed dispersal; however, in cultivated crops it is one of the major sources of yield loss. Therefore, indehiscence of fruits, pods, etc., was likely to be one of the first traits strongly selected in crop domestication. Even with the historical selection against dehiscence in early domesticates, it is a trait still targeted in many breeding programs, particularly in minor or underutilized crops. Here, we review dehiscence in pulse (grain legume) crops, which are of growing importance as a source of protein in human and livestock diets, and which have received less attention than cereal crops and the model plant Arabidopsis thaliana. We specifically focus on the (i) history of indehiscence in domestication across legumes, (ii) structures and the mechanisms involved in shattering, (iii) the molecular pathways underlying this important trait, (iv) an overview of the extent of crop losses due to shattering, and the effects of environmental factors on shattering, and, (v) efforts to reduce shattering in crops. While our focus is mainly pulse crops, we also included comparisons to crucifers and cereals because there is extensive research on shattering in these taxa

    Macroevolution of the plant–hummingbird pollination system

    Get PDF
    ABSTRACTPlant–hummingbird interactions are considered a classic example of coevolution, a process in which mutually dependent species influence each other's evolution. Plants depend on hummingbirds for pollination, whereas hummingbirds rely on nectar for food. As a step towards understanding coevolution, this review focuses on the macroevolutionary consequences of plant–hummingbird interactions, a relatively underexplored area in the current literature. We synthesize prior studies, illustrating the origins and dynamics of hummingbird pollination across different angiosperm clades previously pollinated by insects (mostly bees), bats, and passerine birds. In some cases, the crown age of hummingbirds pre‐dates the plants they pollinate. In other cases, plant groups transitioned to hummingbird pollination early in the establishment of this bird group in the Americas, with the build‐up of both diversities coinciding temporally, and hence suggesting co‐diversification. Determining what triggers shifts to and away from hummingbird pollination remains a major open challenge. The impact of hummingbirds on plant diversification is complex, with many tropical plant lineages experiencing increased diversification after acquiring flowers that attract hummingbirds, and others experiencing no change or even a decrease in diversification rates. This mixed evidence suggests that other extrinsic or intrinsic factors, such as local climate and isolation, are important covariables driving the diversification of plants adapted to hummingbird pollination. To guide future studies, we discuss the mechanisms and contexts under which hummingbirds, as a clade and as individual species (e.g. traits, foraging behaviour, degree of specialization), could influence plant evolution. We conclude by commenting on how macroevolutionary signals of the mutualism could relate to coevolution, highlighting the unbalanced focus on the plant side of the interaction, and advocating for the use of species‐level interaction data in macroevolutionary studies

    The diversity and evolution of pollination systems in large plant clades: Apocynaceae as a case study

    Get PDF
    Background and Aims Large clades of angiosperms are often characterized by diverse interactions with pollinators, but how these pollination systems are structured phylogenetically and biogeographically is still uncertain for most families. Apocynaceae is a clade of >5300 species with a worldwide distribution. A database representing >10 % of species in the family was used to explore the diversity of pollinators and evolutionary shifts in pollination systems across major clades and regions. Methods The database was compiled from published and unpublished reports. Plants were categorized into broad pollination systems and then subdivided to include bimodal systems. These were mapped against the five major divisions of the family, and against the smaller clades. Finally, pollination systems were mapped onto a phylogenetic reconstruction that included those species for which sequence data are available, and transition rates between pollination systems were calculated. Key Results Most Apocynaceae are insect pollinated with few records of bird pollination. Almost three-quarters of species are pollinated by a single higher taxon (e.g. flies or moths); 7 % have bimodal pollination systems, whilst the remaining approx. 20 % are insect generalists. The less phenotypically specialized flowers of the Rauvolfioids are pollinated by a more restricted set of pollinators than are more complex flowers within the Apocynoids + Periplocoideae + Secamonoideae + Asclepiadoideae (APSA) clade. Certain combinations of bimodal pollination systems are more common than others. Some pollination systems are missing from particular regions, whilst others are over-represented. Conclusions Within Apocynaceae, interactions with pollinators are highly structured both phylogenetically and biogeographically. Variation in transition rates between pollination systems suggest constraints on their evolution, whereas regional differences point to environmental effects such as filtering of certain pollinators from habitats. This is the most extensive analysis of its type so far attempted and gives important insights into the diversity and evolution of pollination systems in large clades

    The Effects of Dispersal and Pollination on Plantaginaceae Diversification

    No full text
    The rich diversity of flowering plants can be explained by a variety of mechanisms, including geographical distribution, range expansion, and floral variance, which correlates with different biotic pollination forms. Plantaginaceae is an ideal model to examine these mechanisms providing the angiosperm diversity, as the family has diverse distribution patterns both in the Old World and the New World, and the family has representatives of many different pollination syndromes. Using molecular phylogenetics, ancestral reconstructions, and phylogenetic modeling and hypothesis testing, this study aimed to investigate the factors affecting the macroevolution of the angiosperm family Plantaginaceae. With 683 species from 72 genera, and a total of 6996 characters from 5 different molecular markers, the phylogenetic reconstruction revealed that Plantaginaceae have 12 strongly supported monophyletic tribes. The family was inferred to have a New World origin, and experienced several long-distance dispersal events between the Old World and the New World. In some cases, these long-distance dispersals were linked to chromosome number changes in the family. Sympatric speciation was shown to be a significant diversification mode in the family, which had some heterogeneity in terms of speciation rates among the tribes. These diversification patterns were not correlated with geographic distribution, as diversification rates in the Old World and the New World were similar. However, long-distance dispersals are found to be the main drivers of speciation within the family. Lastly, pollination was shown to have no effect on diversification in the tribe Antirrhineae In summary, this study investigated the diversification patterns within the diverse angiosperm family Plantaginaceae. Since its origin in the New World approximately 48.81 mya, the family has experienced several long-distance dispersal events between the Old World and the New World. Along with the changes in chromosome numbers, long-distance dispersal was found to be a strong contributor to the diversity in the family

    Diversification in Monkeyflowers: An Investigation of the Effects of Elevation and Floral Color in the Genus Mimulus

    No full text
    The vast diversity of floral colours in many flowering plant families, paired with the observation of preferences among pollinators, suggests that floral colour may be involved in the process of speciation in flowering plants. While transitions in floral colour have been examined in numerous genera, we have very little information on the consequences of floral colour transitions to the evolutionary success of a clade. Overlaid upon these patterns is the possibility that certain floral colours are more prevalent in certain environments, with the causes of differential diversification being more directly determined by geographical distribution. Here we examine transition rates to anthocyanin + carotenoid rich (red/orange/fuschia) flowers and examine whether red/orange flowers are associated with differences in speciation and/or extinction rates in Mimulus. Because it has been suggested that reddish flowers are more prevalent at high elevation, we also examine the macroevolutionary evidence for this association and determine if there is evidence for differential diversification at high elevations. We find that, while red/orange clades have equivalent speciation rates, the trait state of reddish flowers reverts more rapidly to the nonreddish trait state. Moreover, there is evidence for high speciation rates at high elevation and no evidence for transition rates in floral colour to differ depending on elevation.Peer Reviewe
    corecore