19 research outputs found

    Analyzing Socio-Academic Factors and Predictive Modeling of Student Performance Using Machine Learning Techniques

    Get PDF
    Understanding the factors that influence student performance is crucial for improving educational outcomes. Thus, this study aims to examine the impact of socio-economic and psychological factors on student performance, less is known about how students' personal attitudes and behaviors across different departments and activities correlate with their academic success. This study employs exploratory data analysis (EDA) to identify trends and relationships within the dataset. Machine learning techniques, such as K-means clustering and Long Short-Term Memory (LSTM) networks, are utilized to model and predict student performance based on their reported behaviors and preferences. The dataset is reduced using Principal Component Analysis (PCA) to enhance the clustering process. The findings suggest significant variations in academic performance based on departmental affiliation, gender, and engagement in certification courses. The LSTM model achieved an accuracy of 91% on the test set, demonstrating substantial predictive capability. However, the classification report reveals that while the model was highly effective in identifying the majority class (label 1), achieving a precision of 91% and a recall of 100%, it failed to correctly predict any instances of the minority class (label 0). The insights from this study could help educators tailor interventions to address the specific needs of students based on their behaviors and departmental affiliations, leading to more personalized education strategies and potentially improving academic outcomes. Doi: 10.28991/ESJ-2024-08-04-05 Full Text: PD

    Evaluating the performance of tools used to call minority variants from whole genome short-read data.

    Get PDF
    Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers' agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants

    Enrichment approach for unbiased sequencing of respiratory syncytial virus directly from clinical samples

    Get PDF
    Background: Nasopharyngeal samples contain higher quantities of bacterial and host nucleic acids relative to viruses; presenting challenges during virus metagenomics sequencing, which underpins agnostic sequencing protocols. We aimed to develop a viral enrichment protocol for unbiased whole-genome sequencing of respiratory syncytial virus (RSV) from nasopharyngeal samples using the Oxford Nanopore Technology (ONT) MinION platform. Methods: We assessed two protocols using RSV positive samples. Protocol 1 involved physical pre-treatment of samples by centrifugal processing before RNA extraction, while Protocol 2 entailed direct RNA extraction without prior enrichment. Concentrates from Protocol 1 and RNA extracts from Protocol 2 were each divided into two fractions; one was DNase treated while the other was not. RNA was then extracted from both concentrate fractions per sample and RNA from both protocols converted to cDNA, which was then amplified using the tagged Endoh primers through Sequence-Independent Single-Primer Amplification (SISPA) approach, a library prepared, and sequencing done. Statistical significance during analysis was tested using the Wilcoxon signed-rank test. Results: DNase-treated fractions from both protocols recorded significantly reduced host and bacterial contamination unlike the untreated fractions (in each protocol p<0.01). Additionally, DNase treatment after RNA extraction (Protocol 2) enhanced host and bacterial read reduction compared to when done before (Protocol 1). However, neither protocol yielded whole RSV genomes. Sequenced reads mapped to parts of the nucleoprotein (N gene) and polymerase complex (L gene) from Protocol 1 and 2, respectively. Conclusions: DNase treatment was most effective in reducing host and bacterial contamination, but its effectiveness improved if done after RNA extraction than before. We attribute the incomplete genome segments to amplification biases resulting from the use of short length random sequence (6 bases) in tagged Endoh primers. Increasing the length of the random nucleotides from six hexamers to nine or 12 in future studies may reduce the coverage biases

    Detection of SARS-CoV-2 variant 501Y.V2 in Comoros Islands in January 2021 [version 1; peer review: 2 approved]

    Get PDF
    Background. Genomic data is key in understanding the spread and evolution of SARS-CoV-2 pandemic and informing the design and evaluation of interventions. However, SARS-CoV-2 genomic data remains scarce across Africa, with no reports yet from the Indian Ocean islands. Methods. We genome sequenced six SARS-CoV-2 positive samples from the first major infection wave in the Union of Comoros in January 2021 and undertook detailed phylogenetic analysis. Results. All the recovered six genomes classified within the 501Y.V2 variant of concern (also known as lineage B.1.351) and appeared to be from 2 sub-clusters with the most recent common ancestor dated 30th Oct-2020 (95% Credibility Interval: 06th Sep-2020 to 10th Dec-2020). Comparison of the Comoros genomes with those of 501Y.V2 variant of concern from other countries deposited into the GISAID database revealed their close association with viruses identified in France and Mayotte (part of the Comoros archipelago and a France, Overseas Department). Conclusions. The recovered genomes, albeit few, confirmed local transmission following probably multiple introductions of the SARS-CoV-2 501Y.V2 variant of concern during the Comoros’s first major COVID-19 wave. These findings demonstrate the importance of genomic surveillance and have implications for ongoing control strategies on the islands

    Tracking the introduction and spread of SARS-CoV-2 in coastal Kenya

    Get PDF
    Genomic surveillance of SARS-CoV-2 is important for understanding both the evolution and the patterns of local and global transmission. Here, we generated 311 SARS-CoV-2 genomes from samples collected in coastal Kenya between 17th March and 31st July 2020. We estimated multiple independent SARS-CoV-2 introductions into the region were primarily of European origin, although introductions could have come through neighbouring countries. Lineage B.1 accounted for 74% of sequenced cases. Lineages A, B and B.4 were detected in screened individuals at the Kenya-Tanzania border or returning travellers. Though multiple lineages were introduced into coastal Kenya following the initial confirmed case, none showed extensive local expansion other than lineage B.1. International points of entry were important conduits of SARS-CoV-2 importations into coastal Kenya and early public health responses prevented established transmission of some lineages. Undetected introductions through points of entry including imports from elsewhere in the country gave rise to the local epidemic at the Kenyan coast

    Transmission networks of SARS-CoV-2 in coastal Kenya during the first two waves : a retrospective genomic study

    Get PDF
    Background: Detailed understanding on SARS-CoV-2 regional transmission networks within sub-Saharan Africa is key for guiding local public health interventions against the pandemic. Methods: Here, we analysed 1,139 SARS-CoV-2 genomes from positive samples collected between March 2020 and February 2021 across six counties of Coastal Kenya (Mombasa, Kilifi, Taita Taveta, Kwale, Tana River and Lamu) to infer virus introductions and local transmission patterns during the first two waves of infections. Virus importations were inferred using ancestral state reconstruction and virus dispersal between counties were estimated using discrete phylogeographic analysis. Results: During Wave 1, 23 distinct Pango lineages were detected across the six counties, while during Wave 2, 29 lineages were detected; nine of which occurred in both waves, and four seemed to be Kenya specific (B.1.530, B.1.549, B.1.596.1 and N.8). Most of the sequenced infections belonged to lineage B.1 (n=723, 63%) which predominated in both Wave 1 (73%, followed by lineages N.8 (6%) and B.1.1 (6%)) and Wave 2 (56%, followed by lineages B.1.549 (21%) and B.1.530 (5%). Over the study period, we estimated 280 SARS-CoV-2 virus importations into Coastal Kenya. Mombasa City, a vital tourist and commercial centre for the region, was a major route for virus imports, most of which occurred during Wave 1, when many COVID-19 government restrictions were still in force. In Wave 2, inter-county transmission predominated, resulting in the emergence of local transmission chains and diversity. Conclusions: Our analysis supports moving COVID-19 control strategies in the region from a focus on international travel to strategies that will reduce local transmission

    The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance.

    Get PDF
    Investment in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing in Africa over the past year has led to a major increase in the number of sequences that have been generated and used to track the pandemic on the continent, a number that now exceeds 100,000 genomes. Our results show an increase in the number of African countries that are able to sequence domestically and highlight that local sequencing enables faster turnaround times and more-regular routine surveillance. Despite limitations of low testing proportions, findings from this genomic surveillance study underscore the heterogeneous nature of the pandemic and illuminate the distinct dispersal dynamics of variants of concern-particularly Alpha, Beta, Delta, and Omicron-on the continent. Sustained investment for diagnostics and genomic surveillance in Africa is needed as the virus continues to evolve while the continent faces many emerging and reemerging infectious disease threats. These investments are crucial for pandemic preparedness and response and will serve the health of the continent well into the 21st century

    The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance

    Get PDF
    INTRODUCTION Investment in Africa over the past year with regard to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing has led to a massive increase in the number of sequences, which, to date, exceeds 100,000 sequences generated to track the pandemic on the continent. These sequences have profoundly affected how public health officials in Africa have navigated the COVID-19 pandemic. RATIONALE We demonstrate how the first 100,000 SARS-CoV-2 sequences from Africa have helped monitor the epidemic on the continent, how genomic surveillance expanded over the course of the pandemic, and how we adapted our sequencing methods to deal with an evolving virus. Finally, we also examine how viral lineages have spread across the continent in a phylogeographic framework to gain insights into the underlying temporal and spatial transmission dynamics for several variants of concern (VOCs). RESULTS Our results indicate that the number of countries in Africa that can sequence the virus within their own borders is growing and that this is coupled with a shorter turnaround time from the time of sampling to sequence submission. Ongoing evolution necessitated the continual updating of primer sets, and, as a result, eight primer sets were designed in tandem with viral evolution and used to ensure effective sequencing of the virus. The pandemic unfolded through multiple waves of infection that were each driven by distinct genetic lineages, with B.1-like ancestral strains associated with the first pandemic wave of infections in 2020. Successive waves on the continent were fueled by different VOCs, with Alpha and Beta cocirculating in distinct spatial patterns during the second wave and Delta and Omicron affecting the whole continent during the third and fourth waves, respectively. Phylogeographic reconstruction points toward distinct differences in viral importation and exportation patterns associated with the Alpha, Beta, Delta, and Omicron variants and subvariants, when considering both Africa versus the rest of the world and viral dissemination within the continent. Our epidemiological and phylogenetic inferences therefore underscore the heterogeneous nature of the pandemic on the continent and highlight key insights and challenges, for instance, recognizing the limitations of low testing proportions. We also highlight the early warning capacity that genomic surveillance in Africa has had for the rest of the world with the detection of new lineages and variants, the most recent being the characterization of various Omicron subvariants. CONCLUSION Sustained investment for diagnostics and genomic surveillance in Africa is needed as the virus continues to evolve. This is important not only to help combat SARS-CoV-2 on the continent but also because it can be used as a platform to help address the many emerging and reemerging infectious disease threats in Africa. In particular, capacity building for local sequencing within countries or within the continent should be prioritized because this is generally associated with shorter turnaround times, providing the most benefit to local public health authorities tasked with pandemic response and mitigation and allowing for the fastest reaction to localized outbreaks. These investments are crucial for pandemic preparedness and response and will serve the health of the continent well into the 21st century

    Replication Data for: Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data

    No full text
    Several minority variant callers have been developed to describe the minority variants sub-populations from whole genome sequence data. These tools differ based on bioinformatics and statistical approaches used to distinguish between real errors and relevant low-frequency variants. This project evaluated the diagnostic performance of four published minority variant callers and assessed overall concordance used to report minority variants from short-read sequenced data. An ART-Illumina read simulation tool was used to generate artificial short-read datasets of varying coverage based on a Respiratory Syncytial Virus (RSV) reference genome. The samples were spiked with nucleotide variants at predetermined positions and frequencies and thereafter called using FreeBayes, LoFreq, Vardict, and VarScan2. To identify the effect of the quality of data on concordance and performance of the callers we included datasets with error profiles

    The Acceptance of Social Media Sites: An Empirical Study Using PLS-SEM and ML Approaches

    Get PDF
    The study conducted aims to form a conceptual model to calculate the pupils’ acceptance of social media in education and its factors. The study is carried out by extending the Technology Acceptance Model (TAM) using social influence factors. Alongside this, the collected data is evaluated through Machine learning approaches and the partial least squares-structural equation modeling (PLS-SEM). A total of 350 students enrolled at highly regarded universities in the United Arab Emirates (UAE) filled out questionnaire surveys, then analyzed, and results are stated. This research suggests that students’ intention to adopt social media networks in learning is significant social influence, perceived usefulness, and ease of use
    corecore