45 research outputs found

    A meta-analysis of public microarray data identifies gene regulatory pathways deregulated in peripheral blood mononuclear cells from individuals with Systemic Lupus Erythematosus compared to those without

    Get PDF
    BACKGROUND: Systemic Lupus Erythematosus (SLE) is a complex, multi-systemic, autoimmune disease for which the underlying aetiological mechanisms are poorly understood. The genetic and molecular processes underlying lupus have been extensively investigated using a variety of -omics approaches, including genome-wide association studies, candidate gene studies and microarray experiments of differential gene expression in lupus samples compared to controls. METHODS: This study analyses a combination of existing microarray data sets to identify differentially regulated genetic pathways that are dysregulated in human peripheral blood mononuclear cells from SLE patients compared to unaffected controls. Two statistical approaches, quantile discretisation and scaling, are used to combine publicly available expression microarray datasets and perform a meta-analysis of differentially expressed genes. RESULTS: Differentially expressed genes implicated in interferon signaling were identified by the meta-analysis, in agreement with the findings of the individual studies that generated the datasets used. In contrast to the individual studies, however, the meta-analysis and subsequent pathway analysis additionally highlighted TLR signaling, oxidative phosphorylation and diapedesis and adhesion regulatory networks as being differentially regulated in peripheral blood mononuclear cells (PBMCs) from SLE patients compared to controls. CONCLUSION: Our analysis demonstrates that it is possible to derive additional information from publicly available expression data using meta-analysis techniques, which is particularly relevant to research into rare diseases where sample numbers can be limiting.Scopus & IS

    Computational genomics approaches for kidney diseases in Africa

    Get PDF
    Philosophiae Doctor - PhDEnd stage renal disease (ESRD), a more severe form of kidney disease, is considered to be a complex trait that may involve multiple processes which work together on a background of a significant genetic susceptibility. Black Africans have been shown to bear an unequal burden of this disease compared to white Europeans, Americans and Caucasians. Despite this, most of the genetic and epidemiological advances made in understanding the aetiology of kidney diseases have been done in other populations outside of sub-Saharan Africa (SSA). Very little research has been undertaken to investigate key genetic factors that drive ESRD in Africans compared to patients from rest of world populations. Therefore, the primary aim of this Bioinformatics thesis was twofold: firstly, to develop and apply a whole exome sequencing (WES) analysis pipeline and use it to understand a genetic mechanism underlying ESRD in a South African population of mixed ancestry. As I hypothesized that the pipeline would enable the discovery of highly penetrate rare variants with large effect size, which are expected to explain an important fraction of the genetic aetiology and pathogenesis of ESRD in these African patients. Secondly, the aim was to develop and set up a multicenter clinical database that would capture a plethora of clinical data for patients with Lupus, one of the risk factors of ESRD. From WES of six family members (five cases and one control); a total of 23 196 SNVs, 1445 insertions and 1340 deletions, overlapped amongst all affected family members. The variants were consistent with an autosomal dominant inheritance pattern inferred in this family. Of these, only 1550 SNVs, 67 insertions and 112 deletions were present in all affected family members but absent in the unaffected family member. Following detailed evaluation of evidence for variant implication and pathogenicity, only 3 very rare heterozygous missense variants in 3 genes COL4A1 [p.R476W], ICAM1 [p.P352L], COL16A1 [p.T116M] were considered potentially disease causing. Computational relatedness analysis revealed approximate amount of DNA shared by family members and confirmed reported relatedness. Genotyping for the Y chromosome was additionally performed to assist in sample identity. The clinical database has been designed and is being piloted at Groote Schuur medical Hospital at the University of Cape Town. Currently, about 290 patients have already been entered in the registry. The resources and methodologies developed in this thesis have the potential to contribute not only to the understanding of ESRD and its risk factors, but to the successful application of WES in clinical practice. Importantly, it contributes significant information on the genetics of ESRD based on an African family and will also improve scientific infrastructure on the African continent. Clinical databasing will go a long way to enable clinicians to collect and store standardised clinical data for their patients

    Lupus nephritis is associated with poor pregnancy outcomes in pregnant SLE patients in Cape Town: a retrospective analysis

    Get PDF
    Introduction: Systemic lupus erythematosus (SLE) is a multi-system auto-immune disease common in females of child-bearing age. The effect of pregnancy on SLE and vice versa have not been well characterised in Africans. The aim of this study is to describe the pregnancy outcomes of patients with SLE presenting to the maternity department of Groote Schuur Hospital, Cape Town.Methods: This study was designed as a retrospective review of records of pregnant women known with SLE and followed-up at the maternity section of Groote Schuur Hospital. The duration of survey was from the 1st January 2003 to 31st December 2013. Results: There were 61 pregnancies reviewed in 49 patients; 80.3% of the pregnancies were in patients of mixed ancestry and the rest (19.7%) in black African patients. The mean age at presentation of the current pregnancy was 27.2±5.0 years. Mean gestational age at presentation and delivery was 13.0 ± 6.0 weeks and 28.9 ± 9.8 weeks respectively and 47.5% of the pregnancies were in patients with lupus nephritis (LN). Thirty nine (63.9%) pregnancies reached the third trimester and 11.5% of all pregnancies ended in the first trimester. There was a lower number of live births to mothers of African ancestry than to those of mixed ancestry (p=0.001). In 55.7% of the pregnancies, no flare was reported while a renal flare was reported in 23%. Pregnancies in patients with LN had higher frequencies of flares (58.6% vs 31.3%; p=0.032), pre-eclampsia (34.5% vs 12.5%; p=0.041), longer stay in hospital (12.0 ± 9.1 days vs 6.1 ± 5.1 days; p=0.004) and low birth weight babies (1.94 ± 1.02 kg vs 2.55±0.95 kg; p=0.046) than in patients without LN. Only 36 (59%) of the neonates were discharged home alive and of these 2 (5.6%) were to mothers of black African ancestry (p=0.001).Conclusion: Increased lupus activity in pregnant SLE patients may account for the increased deaths of neonates born to SLE mothers. Patients of black African descent and those with LN tend to have a poorer outcome. A multi-disciplinary approach to the management of SLE patients (of child-bearing age or pregnant) needs to be further assessed for better outcomes

    A review and comparative study of cancer detection using machine learning : SBERT and SimCSE application

    Get PDF
    AVAILABILITY OF DATA AND MATERIALS : The data can be accessed at the host database (The European Genome-phenome Archive at the European Bioinformatics Institute, accession number: EGAD00001004582 Data access).BACKGROUND : Using visual, biological, and electronic health records data as the sole input source, pretrained convolutional neural networks and conventional machine learning methods have been heavily employed for the identification of various malignancies. Initially, a series of preprocessing steps and image segmentation steps are performed to extract region of interest features from noisy features. Then, the extracted features are applied to several machine learning and deep learning methods for the detection of cancer. METHODS : In this work, a review of all the methods that have been applied to develop machine learning algorithms that detect cancer is provided. With more than 100 types of cancer, this study only examines research on the four most common and prevalent cancers worldwide: lung, breast, prostate, and colorectal cancer. Next, by using state-of-the-art sentence transformers namely: SBERT (2019) and the unsupervised SimCSE (2021), this study proposes a new methodology for detecting cancer. This method requires raw DNA sequences of matched tumor/normal pair as the only input. The learnt DNA representations retrieved from SBERT and SimCSE will then be sent to machine learning algorithms (XGBoost, Random Forest, LightGBM, and CNNs) for classification. As far as we are aware, SBERT and SimCSE transformers have not been applied to represent DNA sequences in cancer detection settings. RESULTS : The XGBoost model, which had the highest overall accuracy of 73 ± 0.13 % using SBERT embeddings and 75 ± 0.12 % using SimCSE embeddings, was the best performing classifier. In light of these findings, it can be concluded that incorporating sentence representations from SimCSE’s sentence transformer only marginally improved the performance of machine learning models.The South African Medical Research Council (SAMRC) through its Division of Research Capacity Development under the Internship Scholarship Program from funding received from the South African National Treasury.https://bmcbioinformatics.biomedcentral.comam2024Computer ScienceSchool of Health Systems and Public Health (SHSPH)Non

    Discriminatory Gleason grade group signatures of prostate cancer : an application of machine learning methods

    Get PDF
    One of the most precise methods to detect prostate cancer is by evaluation of a stained biopsy by a pathologist under a microscope. Regions of the tissue are assessed and graded according to the observed histological pattern. However, this is not only laborious, but also relies on the experience of the pathologist and tends to suffer from the lack of reproducibility of biopsy outcomes across pathologists. As a result, computational approaches are being sought and machine learning has been gaining momentum in the prediction of the Gleason grade group. To date, machine learning literature has addressed this problem by using features from magnetic resonance imaging images, whole slide images, tissue microarrays, gene expression data, and clinical features. However, there is a gap with regards to predicting the Gleason grade group using DNA sequences as the only input source to the machine learning models. In this work, using whole genome sequence data from South African prostate cancer patients, an application of machine learning and biological experiments were combined to understand the challenges that are associated with the prediction of the Gleason grade group. A series of machine learning binary classifiers (XGBoost, LSTM, GRU, LR, RF) were created only relying on DNA sequences input features. All the models were not able to adequately discriminate between the DNA sequences of the studied Gleason grade groups (Gleason grade group 1 and 5). However, the models were further evaluated in the prediction of tumor DNA sequences from matched-normal DNA sequences, given DNA sequences as the only input source. In this new problem, the models performed acceptably better than before with the XGBoost model achieving the highest accuracy of 74 ± 01, F1 score of 79 ± 01, recall of 99 ± 0.0, and precision of 66 ± 0.1.The South African Medical Research Council (SAMRC) through its Division of Research Capacity Development under the Internship Scholarship Program from funding received from the South African National Treasury.http://www.plosone.orgdm2022Computer ScienceSchool of Health Systems and Public Health (SHSPH

    Baseline predictors of mortality among predominantly rural-dwelling end-stage renal disease patients on chronic dialysis therapies in Limpopo, South Africa

    Get PDF
    BACKGROUND: Dialysis therapy for end-stage renal disease (ESRD) continues to be the readily available renal replacement option in developing countries. While the impact of rural/remote dwelling on mortality among dialysis patients in developed countries is known, it remains to be defined in sub-Saharan Africa. METHODS: A single-center database of end-stage renal disease patients on chronic dialysis therapies treated between 2007 and 2014 at the Polokwane Kidney and Dialysis Centre (PKDC) of the Pietersburg Provincial Hospital, Limpopo South Africa, was retrospectively reviewed. All-cause, cardiovascular, and infection-related mortalities were assessed and associated baseline predictors determined. RESULTS: Of the 340 patients reviewed, 52.1% were male, 92.9% were black Africans, 1.8% were positive for the human immunodeficiency virus (HIV), and 87.5% were rural dwellers. The average distance travelled to the dialysis centre was 112.3 ± 73.4 Km while 67.6% of patients lived in formal housing. Estimated glomerular filtration rate (eGFR) at dialysis initiation was 7.1 ± 3.7 mls/min while hemodialysis (HD) was the predominant modality offered (57.1%). Ninety-two (92) deaths were recorded over the duration of follow-up with the majority (34.8%) of deaths arising from infection-related causes. Continuous ambulatory peritoneal dialysis (CAPD) was a significant predictor of all-cause mortality (HR: 1.62, CI: 1.07-2.46) and infection-related mortality (HR: 2.27, CI: 1.13-4.60). On multivariable cox regression, CAPD remained a significant predictor of all-cause mortality (HR: 2.00, CI: 1.29-3.10) while the risk of death among CAPD patients was also significantly modified by diabetes mellitus (DM) status (HR: 4.99, CI: 2.13-11.71). CONCLUSION: CAPD among predominantly rural dwelling patients in the Limpopo province of South Africa is associated with an increased risk of death from all-causes and infection-related causes

    Identification and characterization of microRNAs expressed in the African malaria vector Anopheles funestus life stages using high throughput sequencing

    Get PDF
    Background: Over the past several years, thousands of microRNAs (miRNAs) have been identified in the genomes of various insects through cloning and sequencing or even by computational prediction. However, the number of miRNAs identified in anopheline species is low and little is known about their role. The mosquito Anopheles funestus is one of the dominant malaria vectors in Africa, which infects and kills millions of people every year. Therefore, small RNA molecules isolated from the four life stages (eggs, larvae, pupae and unfed adult females) of An. funestus were sequenced using next generation sequencing technology. Results: High throughput sequencing of four replicates in combination with computational analysis identified 107 mature miRNA sequences expressed in the An. funestus mosquito. These include 20 novel miRNAs without sequence identity in any organism and eight miRNAs not previously reported in the Anopheles genus but are known in non-anopheles mosquitoes. Finally, the changes in the expression of miRNAs during the mosquito development were determined and the analysis showed that many miRNAs have stage-specific expression, and are co-transcribed and co-regulated during development. Conclusions: This study presents the first direct experimental evidence of miRNAs in An. funestus and the first profiling study of miRNA associated with the maturation in this mosquito. Overall, the results indicate that miRNAs play important roles during the growth and development. Silencing such molecules in a specific life stage could decrease the vector population and therefore interrupt malaria transmission.IS

    Normalization and statistical methods for crossplatform expression array analysis

    No full text
    >Magister Scientiae - MScA large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics

    Normalization and statistical methods for crossplatform expression array analysis

    No full text
    >Magister Scientiae - MScA large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics

    Factors associated with contraception non-use among women aged 15-24 years in migrant communities in Southern Africa in 2018

    No full text
    A research report submitted in partial fulfilment of the requirements for the degree of Master of Science in Epidemiology (Epidemiology and Biostatistics) to the Faculty of Health Sciences, School of Public Health, University of the Witwatersrand, Johannesburg, 2022Introduction The non-use of contraceptive among young women aged 15-24 years is a public health concern because of its adverse social and reproductive health outcomes. The extent of contraceptivenon-use as well as the factors associated with their non-use particularly among young women, specifically in high migrant communities need to be investigated further. The aim of this study was to investigate factors associated with contraceptive non-use among young women residing in high migrant communities in Southern Africa. Methods The study was a secondary data analysis conducted within the larger sexual and reproductive health rights and HIV knows no borders project (SRHR-HIV). The SRHR-HIV project was a longitudinal study which included a cross-sectional baseline survey which was conducted in 2018 to collect data on sexual and reproductive health in 10 high migrant communities in six Southern African countries. In this analysis, the study population was restricted to women aged 15-24 years. Descriptive statistics and multivariable logistic regressions were performed to investigate demographic, social-economic and behavioral factors that are associated with contraceptive non-use. Additional analyses were conducted using multi-level logistic regressions. Results Of the 1 242 eligible women aged 15-24 years, 588 reported not having used any method of contraception representing a prevalence of 47.3%. Overall, 69% of the women were nonmigrant, international migrants had the lowest proportion with 12% and 19% were internal migrants. Contraceptive non-use was high among international migrants (56%) and lowest for internal migrants (38%). Using multivariable logistic regression, age, religion, and ability to refuse sex were significant factors (p<0.05) associated with contraceptive non-use among young women. Contraceptive non-use decreased with increasing age (aOR 0.75, 95% CI 0.69,0.81), was higher for women unable to refuse sex (aOR 1.39, 95% CI 0.92,2.11). Contraceptive non- use was also higher among women of Catholic (aOR 1.75, 95% CI 0.81,3.77) , Protestant (aOR 2.58, 95% CI 1.18,5.68), Islam (aOR 4.62, 95% CI 1.36,15.73) religions. Conclusion The findings of this study highlighted the influence of age, religion, and inability to refuse sex as some of the factors associated with contraceptive non-use among young women. These factors were found were found not to differ among migrants. The findings should be considered and reflected in public health policies to address barriers to the use of contraceptives by women aged 15-24 years.NG (2023
    corecore