17 research outputs found

    Exploring the stability of feature selection methods across a palette of gene expression datasets

    No full text
    Gene expression data often need to be classified into classes or grouped into clusters for further analysis, using different machine learning techniques and an important pre-processing step is feature selection (FS). The aim of this study is to investigate the stability of some diverse FS methods on a plethora of microarray gene expression data. This experimental work is broken into three parts. Step 1 involves running some FS methods on one gene expression dataset to have a preliminary assessment on the similarity, or dissimilarity, of the resulting feature subsets across methods. Step 2 involves running two of these methods on a large number of different datasets to investigate whether the results produced by the methods are dependent on the features of the dataset: binary, multiclass, small or large dataset. The final step explores how the similarity of selected feature subsets between pairs of methods evolves as the size of the subsets are increased. Results show that the studied methods display a high amount of variability in terms of the resulting selected features. The feature subsets differed both inter- and intra- methods for different datasets. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose initial findings are reported herein

    A comparative study of feature selection methods for biomarker discovery

    No full text
    A major area of research is biomarker discovery using gene expression data. Such data is huge and often needs to be classified into classes or clustered, using different machine learning techniques, for further analysis. An important preprocessing step is feature selection (FS) and different such methods have been devised. However, applying different FS techniques to the same dataset do not always produce the same results. In this work, the robustness of FS methods will be looked into. Robustness is defined here as the stability of a given gene pool with respect to the data and the FS method used. Our approach is to investigate the resulting feature subset obtained when running diverse FS methods on different gene expression datasets. As a first step, 10 FS methods were executed using 2 different datasets. Based on the results obtained, 2 of these methods were further investigated using 10 different datasets. The effects of selecting an increasing number of features on the percentage similarity inter-methods were also studied. Our results show that the studied methods exhibit a high amount of variability in the resulting feature subset. The selected feature subsets differed both inter-methods and intra-methods for different datasets. The reason behind this is not clear and possible objective assessment on the ideal (best) subset should be further investigated

    H3ABioNet computational metagenomics workshop in Mauritius : training to analyse microbial diversity for Africa

    No full text
    In the context of recent international initiatives to bolster genomics research for Africa, and more specifically to develop bioinformatics expertise and networks across the continent, a workshop on computational metagenomics was organized during the end of 2014 at the University of Mauritius. The workshop offered background on various aspects of computational biology, including databases and algorithms, sequence analysis fundamentals, metagenomics concepts and tools, practical exercises, journal club activities and research seminars. We have discovered a strong interest in metagenomics research across Africa, to advance practical applications both for human health and the environment. We have also realized the great potential to develop genomics and bioinformatics through collaborative efforts across the continent, and the need for further reinforcing the untapped human potential and exploring the natural resources for stronger engagement of local scientific communities, with a view to contributing towards the improvement of human health and well-being for the citizens of Africa

    Polygenic risk scores for disease risk prediction in Africa: current challenges and future directions

    Get PDF
    Abstract Early identification of genetic risk factors for complex diseases can enable timely interventions and prevent serious outcomes, including mortality. While the genetics underlying many Mendelian diseases have been elucidated, it is harder to predict risk for complex diseases arising from the combined effects of many genetic variants with smaller individual effects on disease aetiology. Polygenic risk scores (PRS), which combine multiple contributing variants to predict disease risk, have the potential to influence the implementation for precision medicine. However, the majority of existing PRS were developed from European data with limited transferability to African populations. Notably, African populations have diverse genetic backgrounds, and a genomic architecture with smaller haplotype blocks compared to European genomes. Subsequently, growing evidence shows that using large-scale African ancestry cohorts as discovery for PRS development may generate more generalizable findings. Here, we (1) discuss the factors contributing to the poor transferability of PRS in African populations, (2) showcase the novel Africa genomic datasets for PRS development, (3) explore the potential clinical utility of PRS in African populations, and (4) provide insight into the future of PRS in Africa

    Genetic Diversity of the Ralstonia solanacearum Species Complex in the Southwest Indian Ocean Islands

    Get PDF
    Epidemiological surveillance of plant pathogens based on genotyping methods is mandatory to improve disease management strategies. In the Southwest Indian Ocean (SWIO) islands, bacterial wilt (BW) caused by the Ralstonia solanacearum species complex (RSSC) is hampering the production of many sustainable and cash crops. To thoroughly analyze the genetic diversity of the RSSC in the SWIO, we performed a wide sampling survey (in Comoros, Mauritius, Reunion, Rodrigues, and Seychelles) that yielded 1,704 isolates from 129 plots, mainly from solanaceous crops. Classification of the isolates to the four major RSSC phylogenetic groups, named phylotypes, showed that 87% were phylotype I, representing the most prevalent strain in each of the SWIO islands. Additionally, 9.7% were phylotype II, and 3.3% were phylotype III; however, these isolates were found only in Reunion. Phylotype IV (2 isolates), known to be restricted to Indonesia-Australia-Japan, was reported in Mauritius, representing the first report of this group in the SWIO. Partial endoglucanase (egl) sequencing, based on the selection of 145 isolates covering the geographic and host diversity in the SWIO (also including strains from Mayotte and Madagascar), revealed 14 sequevars with Reunion and Mauritius displaying the highest sequevar diversity. Through a multilocus sequence analysis (MLSA) scheme based on the partial sequencing of 6 housekeeping genes (gdhA, gyrB, rplB, leuS, adk, and mutS) and 1 virulence-associated gene (egl), we inferred the phylogenetic relationships between these 145 SWIO isolates and 90 worldwide RSSC reference strains. Phylotype I was the most recombinogenic, although recombination events were detected among all phylotypes. A multilocus sequence typing (MLST) scheme identified 29 sequence types (STs) with variable geographic distributions in the SWIO. The outstanding epidemiologic feature was STI-13 (sequevar I-31), which was overrepresented in the SWIO and obviously reflected a lineage strongly adapted to the SWIO environment. A goeBURST analysis identified eight clonal complexes (CCs) including SWIO isolates, four CCs being geographically restricted to the SWIO, and four CCs being widespread beyond the SWIO. This work, which highlights notable genetic links between African and SWIO strains, provides a basis for the epidemiological surveillance of RSSC and will contribute to BW management in the SWIO
    corecore