45 research outputs found
A meta-analysis of public microarray data identifies gene regulatory pathways deregulated in peripheral blood mononuclear cells from individuals with Systemic Lupus Erythematosus compared to those without
BACKGROUND: Systemic Lupus Erythematosus (SLE) is a complex, multi-systemic, autoimmune disease for which the
underlying aetiological mechanisms are poorly understood. The genetic and molecular processes underlying lupus
have been extensively investigated using a variety of -omics approaches, including genome-wide association studies,
candidate gene studies and microarray experiments of differential gene expression in lupus samples compared to
controls.
METHODS: This study analyses a combination of existing microarray data sets to identify differentially regulated genetic
pathways that are dysregulated in human peripheral blood mononuclear cells from SLE patients compared to unaffected
controls. Two statistical approaches, quantile discretisation and scaling, are used to combine publicly available expression
microarray datasets and perform a meta-analysis of differentially expressed genes.
RESULTS: Differentially expressed genes implicated in interferon signaling were identified by the meta-analysis,
in agreement with the findings of the individual studies that generated the datasets used. In contrast to the
individual studies, however, the meta-analysis and subsequent pathway analysis additionally highlighted TLR
signaling, oxidative phosphorylation and diapedesis and adhesion regulatory networks as being differentially
regulated in peripheral blood mononuclear cells (PBMCs) from SLE patients compared to controls.
CONCLUSION: Our analysis demonstrates that it is possible to derive additional information from publicly
available expression data using meta-analysis techniques, which is particularly relevant to research into rare
diseases where sample numbers can be limiting.Scopus & IS
Computational genomics approaches for kidney diseases in Africa
Philosophiae Doctor - PhDEnd stage renal disease (ESRD), a more severe form of kidney disease, is considered to be a complex trait that may involve multiple processes which work together on a background of a significant genetic susceptibility. Black Africans have been shown to bear an unequal burden of this disease compared to white Europeans, Americans and Caucasians. Despite this, most of the genetic and epidemiological advances made in understanding the aetiology of kidney diseases have been done in other populations outside of sub-Saharan Africa (SSA). Very
little research has been undertaken to investigate key genetic factors that drive ESRD in Africans compared to patients from rest of world populations. Therefore, the primary aim of this Bioinformatics thesis was twofold: firstly, to develop and
apply a whole exome sequencing (WES) analysis pipeline and use it to understand a genetic mechanism underlying ESRD in a South African population of mixed ancestry. As I hypothesized that the pipeline would enable the discovery of highly penetrate rare variants with large effect size, which are expected to explain an important fraction of the genetic aetiology and pathogenesis of ESRD in these African patients. Secondly, the aim was to develop and set up a multicenter clinical database that would capture a plethora of clinical data for patients with Lupus, one of the risk factors of ESRD. From WES of six family members (five cases and one control); a total of 23 196 SNVs, 1445 insertions and 1340 deletions, overlapped amongst all affected family members. The variants were consistent with an autosomal dominant inheritance pattern inferred in this family. Of these, only 1550 SNVs, 67 insertions and 112 deletions were present in all affected family members but absent in the unaffected family member. Following detailed evaluation of evidence for variant implication and pathogenicity, only 3 very rare heterozygous missense variants in 3 genes COL4A1 [p.R476W], ICAM1 [p.P352L], COL16A1 [p.T116M] were considered potentially disease causing. Computational relatedness analysis revealed approximate amount of DNA shared by family members and confirmed reported relatedness. Genotyping for the Y chromosome was
additionally performed to assist in sample identity. The clinical database has been designed and is being piloted at Groote Schuur medical Hospital at the University of Cape Town. Currently, about 290 patients have already been entered in the registry. The resources and methodologies developed in this thesis have the potential to contribute not only to the understanding of ESRD and its risk factors, but to the successful application of WES in clinical practice. Importantly, it contributes significant information on the genetics of ESRD based on an African family and will also improve scientific infrastructure on the African continent. Clinical databasing will go a long way to enable clinicians to collect and store standardised clinical data for their patients
Lupus nephritis is associated with poor pregnancy outcomes in pregnant SLE patients in Cape Town: a retrospective analysis
Introduction: Systemic lupus erythematosus (SLE) is a multi-system auto-immune disease common in females of child-bearing age. The effect of pregnancy on SLE and vice versa have not been well characterised in Africans. The aim of this study is to describe the pregnancy outcomes of patients with SLE presenting to the maternity department of Groote Schuur Hospital, Cape Town.Methods: This study was designed as a retrospective review of records of pregnant women known with SLE and followed-up at the maternity section of Groote Schuur Hospital. The duration of survey was from the 1st January 2003 to 31st December 2013. Results: There were 61 pregnancies reviewed in 49 patients; 80.3% of the pregnancies were in patients of mixed ancestry and the rest (19.7%) in black African patients. The mean age at presentation of the current pregnancy was 27.2±5.0 years. Mean gestational age at presentation and delivery was 13.0 ± 6.0 weeks and 28.9 ± 9.8 weeks respectively and 47.5% of the pregnancies were in patients with lupus nephritis (LN). Thirty nine (63.9%) pregnancies reached the third trimester and 11.5% of all pregnancies ended in the first trimester. There was a lower number of live births to mothers of African ancestry than to those of mixed ancestry (p=0.001). In 55.7% of the pregnancies, no flare was reported while a renal flare was reported in 23%. Pregnancies in patients with LN had higher frequencies of flares (58.6% vs 31.3%; p=0.032), pre-eclampsia (34.5% vs 12.5%; p=0.041), longer stay in hospital (12.0 ± 9.1 days vs 6.1 ± 5.1 days; p=0.004) and low birth weight babies (1.94 ± 1.02 kg vs 2.55±0.95 kg; p=0.046) than in patients without LN. Only 36 (59%) of the neonates were discharged home alive and of these 2 (5.6%) were to mothers of black African ancestry (p=0.001).Conclusion: Increased lupus activity in pregnant SLE patients may account for the increased deaths of neonates born to SLE mothers. Patients of black African descent and those with LN tend to have a poorer outcome. A multi-disciplinary approach to the management of SLE patients (of child-bearing age or pregnant) needs to be further assessed for better outcomes
A review and comparative study of cancer detection using machine learning : SBERT and SimCSE application
AVAILABILITY OF DATA AND MATERIALS : The data can be accessed at the host database (The European Genome-phenome Archive at the European Bioinformatics
Institute, accession number: EGAD00001004582 Data access).BACKGROUND : Using visual, biological, and electronic health records data as the sole
input source, pretrained convolutional neural networks and conventional machine
learning methods have been heavily employed for the identification of various malignancies.
Initially, a series of preprocessing steps and image segmentation steps are
performed to extract region of interest features from noisy features. Then, the extracted
features are applied to several machine learning and deep learning methods for the
detection of cancer.
METHODS : In this work, a review of all the methods that have been applied to develop
machine learning algorithms that detect cancer is provided. With more than 100 types
of cancer, this study only examines research on the four most common and prevalent
cancers worldwide: lung, breast, prostate, and colorectal cancer. Next, by using
state-of-the-art sentence transformers namely: SBERT (2019) and the unsupervised
SimCSE (2021), this study proposes a new methodology for detecting cancer. This
method requires raw DNA sequences of matched tumor/normal pair as the only input.
The learnt DNA representations retrieved from SBERT and SimCSE will then be sent to
machine learning algorithms (XGBoost, Random Forest, LightGBM, and CNNs) for classification.
As far as we are aware, SBERT and SimCSE transformers have not been applied
to represent DNA sequences in cancer detection settings.
RESULTS : The XGBoost model, which had the highest overall accuracy of 73 ± 0.13 %
using SBERT embeddings and 75 ± 0.12 % using SimCSE embeddings, was the best
performing classifier. In light of these findings, it can be concluded that incorporating
sentence representations from SimCSE’s sentence transformer only marginally
improved the performance of machine learning models.The South African Medical Research Council (SAMRC) through its Division of Research Capacity Development under the Internship Scholarship Program from funding received from the South African National Treasury.https://bmcbioinformatics.biomedcentral.comam2024Computer ScienceSchool of Health Systems and Public Health (SHSPH)Non
Discriminatory Gleason grade group signatures of prostate cancer : an application of machine learning methods
One of the most precise methods to detect prostate cancer is by evaluation of a stained
biopsy by a pathologist under a microscope. Regions of the tissue are assessed and graded
according to the observed histological pattern. However, this is not only laborious, but also
relies on the experience of the pathologist and tends to suffer from the lack of reproducibility
of biopsy outcomes across pathologists. As a result, computational approaches are being
sought and machine learning has been gaining momentum in the prediction of the Gleason
grade group. To date, machine learning literature has addressed this problem by using features from magnetic resonance imaging images, whole slide images, tissue microarrays,
gene expression data, and clinical features. However, there is a gap with regards to predicting the Gleason grade group using DNA sequences as the only input source to the machine
learning models. In this work, using whole genome sequence data from South African prostate cancer patients, an application of machine learning and biological experiments were
combined to understand the challenges that are associated with the prediction of the Gleason grade group. A series of machine learning binary classifiers (XGBoost, LSTM, GRU,
LR, RF) were created only relying on DNA sequences input features. All the models were
not able to adequately discriminate between the DNA sequences of the studied Gleason
grade groups (Gleason grade group 1 and 5). However, the models were further evaluated
in the prediction of tumor DNA sequences from matched-normal DNA sequences, given
DNA sequences as the only input source. In this new problem, the models performed
acceptably better than before with the XGBoost model achieving the highest accuracy of 74
± 01, F1 score of 79 ± 01, recall of 99 ± 0.0, and precision of 66 ± 0.1.The South African Medical Research Council (SAMRC) through its Division of Research Capacity Development under the Internship Scholarship Program from funding received from the South African National Treasury.http://www.plosone.orgdm2022Computer ScienceSchool of Health Systems and Public Health (SHSPH
Baseline predictors of mortality among predominantly rural-dwelling end-stage renal disease patients on chronic dialysis therapies in Limpopo, South Africa
BACKGROUND: Dialysis therapy for end-stage renal disease (ESRD) continues to be the readily available renal replacement option in developing countries. While the impact of rural/remote dwelling on mortality among dialysis patients in developed countries is known, it remains to be defined in sub-Saharan Africa. METHODS: A single-center database of end-stage renal disease patients on chronic dialysis therapies treated between 2007 and 2014 at the Polokwane Kidney and Dialysis Centre (PKDC) of the Pietersburg Provincial Hospital, Limpopo South Africa, was retrospectively reviewed. All-cause, cardiovascular, and infection-related mortalities were assessed and associated baseline predictors determined. RESULTS: Of the 340 patients reviewed, 52.1% were male, 92.9% were black Africans, 1.8% were positive for the human immunodeficiency virus (HIV), and 87.5% were rural dwellers. The average distance travelled to the dialysis centre was 112.3 ± 73.4 Km while 67.6% of patients lived in formal housing. Estimated glomerular filtration rate (eGFR) at dialysis initiation was 7.1 ± 3.7 mls/min while hemodialysis (HD) was the predominant modality offered (57.1%). Ninety-two (92) deaths were recorded over the duration of follow-up with the majority (34.8%) of deaths arising from infection-related causes. Continuous ambulatory peritoneal dialysis (CAPD) was a significant predictor of all-cause mortality (HR: 1.62, CI: 1.07-2.46) and infection-related mortality (HR: 2.27, CI: 1.13-4.60). On multivariable cox regression, CAPD remained a significant predictor of all-cause mortality (HR: 2.00, CI: 1.29-3.10) while the risk of death among CAPD patients was also significantly modified by diabetes mellitus (DM) status (HR: 4.99, CI: 2.13-11.71). CONCLUSION: CAPD among predominantly rural dwelling patients in the Limpopo province of South Africa is associated with an increased risk of death from all-causes and infection-related causes
Identification and characterization of microRNAs expressed in the African malaria vector Anopheles funestus life stages using high throughput sequencing
Background: Over the past several years, thousands of microRNAs (miRNAs) have been identified in the genomes of various insects through cloning and sequencing or even by computational prediction. However, the number of miRNAs identified in anopheline species is low and little is known about their role. The mosquito Anopheles funestus is one of the dominant malaria vectors in Africa, which infects and kills millions of people every year. Therefore, small RNA molecules isolated from the four life stages (eggs, larvae, pupae and unfed adult females) of An. funestus were sequenced using next generation sequencing technology. Results: High throughput sequencing of four replicates in combination with computational analysis identified 107 mature miRNA sequences expressed in the An. funestus mosquito. These include 20 novel miRNAs without sequence identity in any organism and eight miRNAs not previously reported in the Anopheles genus but are known in non-anopheles mosquitoes. Finally, the changes in the expression of miRNAs during the mosquito development were determined and the analysis showed that many miRNAs have stage-specific expression, and are co-transcribed and co-regulated during development. Conclusions: This study presents the first direct experimental evidence of miRNAs in An. funestus and the first profiling study of miRNA associated with the maturation in this mosquito. Overall, the results indicate that miRNAs play important roles during the growth and development. Silencing such molecules in a specific life stage could decrease the vector population and therefore interrupt malaria transmission.IS
Normalization and statistical methods for crossplatform expression array analysis
>Magister Scientiae - MScA large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics
Normalization and statistical methods for crossplatform expression array analysis
>Magister Scientiae - MScA large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics
Factors associated with contraception non-use among women aged 15-24 years in migrant communities in Southern Africa in 2018
A research report submitted in partial fulfilment of the requirements for the degree of Master of Science in Epidemiology (Epidemiology and Biostatistics)
to the Faculty of Health Sciences, School of Public Health, University of the Witwatersrand, Johannesburg, 2022Introduction
The non-use of contraceptive among young women aged 15-24 years is a public health concern because of its adverse social and reproductive health outcomes. The extent of contraceptivenon-use as well as the factors associated with their non-use particularly among young women, specifically in high migrant communities need to be investigated further. The aim of this study was to investigate factors associated with contraceptive non-use among young women residing in high migrant communities in Southern Africa.
Methods
The study was a secondary data analysis conducted within the larger sexual and reproductive health rights and HIV knows no borders project (SRHR-HIV). The SRHR-HIV project was a longitudinal study which included a cross-sectional baseline survey which was conducted in 2018 to collect data on sexual and reproductive health in 10 high migrant communities in six Southern African countries. In this analysis, the study population was restricted to women aged 15-24 years. Descriptive statistics and multivariable logistic regressions were performed to investigate demographic, social-economic and behavioral factors that are associated with contraceptive non-use. Additional analyses were conducted using multi-level logistic regressions.
Results
Of the 1 242 eligible women aged 15-24 years, 588 reported not having used any method of contraception representing a prevalence of 47.3%. Overall, 69% of the women were nonmigrant, international migrants had the lowest proportion with 12% and 19% were internal migrants. Contraceptive non-use was high among international migrants (56%) and lowest for internal migrants (38%). Using multivariable logistic regression, age, religion, and ability to refuse sex were significant factors (p<0.05) associated with contraceptive non-use among young women. Contraceptive non-use decreased with increasing age (aOR 0.75, 95% CI 0.69,0.81), was higher for women unable to refuse sex (aOR 1.39, 95% CI 0.92,2.11).
Contraceptive non- use was also higher among women of Catholic (aOR 1.75, 95% CI 0.81,3.77) , Protestant (aOR 2.58, 95% CI 1.18,5.68), Islam (aOR 4.62, 95% CI 1.36,15.73)
religions.
Conclusion
The findings of this study highlighted the influence of age, religion, and inability to refuse sex as some of the factors associated with contraceptive non-use among young women. These
factors were found were found not to differ among migrants. The findings should be considered and reflected in public health policies to address barriers to the use of contraceptives
by women aged 15-24 years.NG (2023
