27 research outputs found

    Reducing the Time Requirement of k-Means Algorithm

    Get PDF
    Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in ddimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA.0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA.0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data

    Expanding Research Capacity in Sub-Saharan Africa Through Informatics, Bioinformatics, and Data Science Training Programs in Mali

    Get PDF
    Bioinformatics and data science research have boundless potential across Africa due to its high levels of genetic diversity and disproportionate burden of infectious diseases, including malaria, tuberculosis, HIV and AIDS, Ebola virus disease, and Lassa fever. This work lays out an incremental approach for reaching underserved countries in bioinformatics and data science research through a progression of capacity building, training, and research efforts. Two global health informatics training programs sponsored by the Fogarty International Center (FIC) were carried out at the University of Sciences, Techniques and Technologies of Bamako, Mali (USTTB) between 1999 and 2011. Together with capacity building efforts through the West Africa International Centers of Excellence in Malaria Research (ICEMR), this progress laid the groundwork for a bioinformatics and data science training program launched at USTTB as part of the Human Heredity and Health in Africa (H3Africa) initiative. Prior to the global health informatics training, its trainees published first or second authorship and third or higher authorship manuscripts at rates of 0.40 and 0.10 per year, respectively. Following the training, these rates increased to 0.70 and 1.23 per year, respectively, which was a statistically significant increase (p < 0.001). The bioinformatics and data science training program at USTTB commenced in 2017 focusing on student, faculty, and curriculum tiers of enhancement. The program’s sustainable measures included institutional support for core elements, university tuition and fees, resource sharing and coordination with local research projects and companion training programs, increased student and faculty publication rates, and increased research proposal submissions. Challenges reliance of high-speed bandwidth availability on short-term funding, lack of a discounted software portal for basic software applications, protracted application processes for United States visas, lack of industry job positions, and low publication rates in the areas of bioinformatics and data science. Long-term, incremental processes are necessary for engaging historically underserved countries in bioinformatics and data science research. The multi-tiered enhancement approach laid out here provides a platform for generating bioinformatics and data science technicians, teachers, researchers, and program managers. Increased literature on bioinformatics and data science training approaches and progress is needed to provide a framework for establishing benchmarks on the topics

    Correction to: Partnership for Research on Ebola VACcination (PREVAC): protocol of a randomized, double-blind, placebo-controlled phase 2 clinical trial evaluating three vaccine strategies against Ebola in healthy volunteers in four West African countries.

    Get PDF
    Following the publication of the original article [1], we were notified of an error in the affiliation of 3 authors of the article: Celine Roy, Laura Richert and Genevieve Chene. Their affiliation was initially mentioned as: “Partnership for Research on Ebola Virus in Liberia (PREVAIL), Monrovia, Liberia” However, their correct affiliation is: Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, CHU Bordeaux, CIC 1401, EUCLID/F-CRIN Clinical Trials Platform, F-33000, Bordeaux, France.tp

    A year of genomic surveillance reveals how the SARS-CoV-2 pandemic unfolded in Africa.

    Get PDF
    The progression of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic in Africa has so far been heterogeneous, and the full impact is not yet well understood. In this study, we describe the genomic epidemiology using a dataset of 8746 genomes from 33 African countries and two overseas territories. We show that the epidemics in most countries were initiated by importations predominantly from Europe, which diminished after the early introduction of international travel restrictions. As the pandemic progressed, ongoing transmission in many countries and increasing mobility led to the emergence and spread within the continent of many variants of concern and interest, such as B.1.351, B.1.525, A.23.1, and C.1.1. Although distorted by low sampling numbers and blind spots, the findings highlight that Africa must not be left behind in the global pandemic response, otherwise it could become a source for new variants

    Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples

    Get PDF
    We describe the MalariaGEN Pf7 data resource, the seventh release of Plasmodium falciparum genome variation data from the MalariaGEN network.  It comprises over 20,000 samples from 82 partner studies in 33 countries, including several malaria endemic regions that were previously underrepresented.  For the first time we include dried blood spot samples that were sequenced after selective whole genome amplification, necessitating new methods to genotype copy number variations.  We identify a large number of newly emerging crt mutations in parts of Southeast Asia, and show examples of heterogeneities in patterns of drug resistance within Africa and within the Indian subcontinent.  We describe the profile of variations in the C-terminal of the csp gene and relate this to the sequence used in the RTS,S and R21 malaria vaccines.  Pf7 provides high-quality data on genotype calls for 6 million SNPs and short indels, analysis of large deletions that cause failure of rapid diagnostic tests, and systematic characterisation of six major drug resistance loci, all of which can be freely downloaded from the MalariaGEN website

    The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance.

    Get PDF
    Investment in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing in Africa over the past year has led to a major increase in the number of sequences that have been generated and used to track the pandemic on the continent, a number that now exceeds 100,000 genomes. Our results show an increase in the number of African countries that are able to sequence domestically and highlight that local sequencing enables faster turnaround times and more-regular routine surveillance. Despite limitations of low testing proportions, findings from this genomic surveillance study underscore the heterogeneous nature of the pandemic and illuminate the distinct dispersal dynamics of variants of concern-particularly Alpha, Beta, Delta, and Omicron-on the continent. Sustained investment for diagnostics and genomic surveillance in Africa is needed as the virus continues to evolve while the continent faces many emerging and reemerging infectious disease threats. These investments are crucial for pandemic preparedness and response and will serve the health of the continent well into the 21st century

    Long-term cellular immunity of vaccines for Zaire Ebola Virus Diseases

    Get PDF
    Recent Ebola outbreaks underscore the importance of continuous prevention and disease control efforts. Authorized vaccines include Merck’s Ervebo (rVSV-ZEBOV) and Johnson & Johnson’s two-dose combination (Ad26.ZEBOV/MVA-BN-Filo). Here, in a five-year follow-up of the PREVAC randomized trial (NCT02876328), we report the results of the immunology ancillary study of the trial. The primary endpoint is to evaluate long-term memory T-cell responses induced by three vaccine regimens: Ad26–MVA, rVSV, and rVSV–booster. Polyfunctional EBOV-specific CD4+ T-cell responses increase after Ad26 priming and are further boosted by MVA, whereas minimal responses are observed in the rVSV groups, declining after one year. In-vitro expansion for eight days show sustained EBOV-specific T-cell responses for up to 60 months post-prime vaccination with both Ad26-MVA and rVSV, with no decline. Cytokine production analysis identify shared biomarkers between the Ad26-MVA and rVSV groups. In secondary endpoint, we observed an elevation of pro-inflammatory cytokines at Day 7 in the rVSV group. Finally, we establish a correlation between EBOV-specific T-cell responses and anti-EBOV IgG responses. Our findings can guide booster vaccination recommendations and help identify populations likely to benefit from revaccination
    corecore