4 research outputs found

    Development of Bioinformatics Infrastructure for Genomics Research in H3Africa

    Get PDF
    Background: Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet’s role has evolved in response to changing needs from the consortium and the African bioinformatics community. Objectives: H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Methods and Results: Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. Conclusions: For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa

    Investigating on data augmentation and generative adversarial networks (GAN s) for diabetic retinopathy

    No full text
    The early detection of diabetic retinopathy (DR) disease, which is a complication of diabetes, may help to reduce blindness. One of the issues faced currently is the limited number of ophthalmologists available to take care of the diabetes patients. To overcome this problem, automated DR grading applications are required. Many researchers are now developing deep learning to build these decision support systems. Nevertheless, the frequent misclassification of DR remains an open challenge. Class imbalance in the datasets and limited labelled datasets are the root causes of misclassification. In this work, we are investigating data augmentation and Generative Adversarial Networks (GANs), known to countermeasure class imbalance and the dearth of labelled images. We have applied these two techniques on the APTOS dataset after the undersampling process. Traditional data augmentation has achieved an accuracy of 72 % versus GAN which has reached an accuracy of 76%

    Balancing data through data augmentation improves the generality of transfer learning for diabetic retinopathy classification

    No full text
    The incidence of diabetes in Mauritius is amongst the highest in the world. Diabetic retinopathy (DR), a complication resulting from the disease, can lead to blindness if not detected early. The aim of this work was to investigate the use of transfer learning and data augmentation for the classification of fundus images into five different stages of diabetic retinopathy. The five stages are No DR, Mild nonproliferative DR, Moderate nonproliferative DR, Severe nonproliferative DR and Proliferative. To this end, deep transfer learning and three pre-trained models, VGG16, ResNet50 and DenseNet169, were used to classify the APTOS dataset. The preliminary experiments resulted in low training and validation accuracies, and hence, the APTOS dataset was augmented while ensuring a balance between the five classes. This dataset was then used to train the three models, and the best three models were used to classify a blind Mauritian test datum. We found that the ResNet50 model produced the best results out of the three models and also achieved very good accuracies for the five classes. The classification of class-4 Mauritian fundus images, severe cases, produced some unexpected results, with some images being classified as mild, and therefore needs to be further investigated

    Stability of feature selection methods ::a study of metrics across different gene expression datasets

    No full text
    Analysis of gene-expression data often requires that a gene (feature) subset is selected and many feature selection (FS) methods have been devised. However, FS methods often generate different lists of features for the same dataset and users then have to choose which list to use. One approach to support this choice is to apply stability metrics on the generated lists and selecting lists on that base. The aim of this study is to investigate the behavior of stability metrics applied to feature subsets generated by FS methods. The experiments in this work explore a plethora of gene expression datasets, FS methods, and expected number of features to compare several stability metrics. The stability metrics have been used to compare five feature selection methods (SVM, SAM, ReliefF, RFE + RF and LIMMA) on gene expression datasets from the EBI repository. Results show that the studied stability metrics display a high amount of variability. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose partial findings are reported herein
    corecore