152 research outputs found

    Simultaneous mapping of multiple gene loci with pooled segregants

    Get PDF
    The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases remains an important challenge. It requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms (SNPs) as genetic markers. Combining the technologies with pooling of segregants, as performed in bulked segregant analysis (BSA), should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. The gene mapping process, applied here, consists of three steps: First, a controlled crossing of parents with and without a trait. Second, selection based on phenotypic screening of the offspring, followed by the mapping of short offspring sequences against the parental reference. The final step aims at detecting genetic markers such as SNPs, insertions and deletions with next generation sequencing (NGS). Markers in close proximity of genomic loci that are associated to the trait have a higher probability to be inherited together. Hence, these markers are very useful for discovering the loci and the genetic mechanism underlying the characteristic of interest. Within this context, NGS produces binomial counts along the genome, i.e., the number of sequenced reads that matches with the SNP of the parental reference strain, which is a proxy for the number of individuals in the offspring that share the SNP with the parent. Genomic loci associated with the trait can thus be discovered by analyzing trends in the counts along the genome. We exploit the link between smoothing splines and generalized mixed models for estimating the underlying structure present in the SNP scatterplots

    Multi-state models for the analysis of time-to-treatment modification among HIV patients under highly active antiretroviral therapy in Southwest Ethiopia

    Get PDF
    Background Highly active antiretroviral therapy (HAART) has shown a dramatic change in controlling the burden of HIV/AIDS. However, the new challenge of HAART is to allow long-term sustainability. Toxicities, comorbidity, pregnancy, and treatment failure, among others, would result in frequent initial HAART regimen change. The aim of this study was to evaluate the durability of first line antiretroviral therapy and to assess the causes of initial highly active antiretroviral therapeutic regimen changes among patients on HAART. Methods A Hospital based retrospective study was conducted from January 2007 to August 2013 at Jimma University Hospital, Southwest Ethiopia. Data on the prescribed ARV along with start date, switching date, and reason for change was collected. The primary outcome was defined as the time-to-treatment change. We adopted a multi-state survival modeling approach assuming each treatment regimen as state. We estimate the transition probability of patients to move from one regimen to another. Result A total of 1284 ART naive patients were included in the study. Almost half of the patients (41.2%) changed their treatment during follow up for various reasons; 442 (34.4%) changed once and 86 (6.69%) changed more than once. Toxicity was the most common reason for treatment changes accounting for 48.94% of the changes, followed by comorbidity (New TB) 14.31%. The HAART combinations that were robust to treatment changes were tenofovir (TDF) + lamivudine (3TC)+ efavirenz (EFV), tenofovir + lamivudine (3TC) + nevirapine (NVP) and zidovudine (AZT) + lamivudine (3TC) + nevirapine (NVP) with 3.6%, 4.5% and 11% treatment changes, respectively. Conclusion Moving away from drugs with poor safety profiles, such as stavudine(d4T), could reduce modification rates and this would improve regimen tolerability, while preserving future treatment options

    beadarrayFilter : an R package to filter beads

    Get PDF
    Microarrays enable the expression levels of thousands of genes to be measured simultaneously. However, only a small fraction of these genes are expected to be expressed under different experimental conditions. Nowadays, filtering has been introduced as a step in the microarray preprocessing pipeline. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. Previous filtering methods focus on the Affymetrix platform and can not be easily ported to the Illumina platform. As such, we developed a filtering method for Illumina bead arrays. We developed an R package, beadarrayFilter, to implement the latter method. In this paper, the main functions in the package are highlighted and using many examples, we illustrate how beadarrayFilter can be used to filter bead arrays

    Choice of initial antiretroviral drugs and treatment outcomes among HIV-infected patients in sub-Saharan Africa: systematic review and meta-analysis of observational studies

    Get PDF
    Background: The effectiveness of antiretroviral therapy (ART) depends on the choice of regimens during initiation. Most evidences from developed countries indicated that there is difference between efavirenz (EFV) and nevirapine (NVP). However, the evidences are limited in resource poor countries particularly in Africa. Thus, this systematic review and meta-analysis was carried out to summarize reported long-term treatment outcomes among people on first line therapy in sub-Saharan Africa. Methods: Observational studies that reported odds ratio, relative risk, hazard ratio, or standardized incidence ratio to compare risk of treatment failure among HIV/AIDS patients who initiated ART with EFV versus NVP were systematically searched. Searches were conducted using the MEDLINE database within PubMed, Google Scholar, HINARI, and Research Gates between 2007 and 2016. Information was extracted using standardized form. Pooled risk ratios (RR) and 95% confidence intervals (CI) were calculated using random-effect, generic inverse variance method. Result: A total of 6394 articles were identified, of which, 29 were eligible for review and abstraction in sub-Saharan Africa. Seventeen articles were used for the meta-analysis. Of a total of 121,092 independent study participants, 76,719 (63.36%) were females. Of these, 40,480 (33.43%) initiated with NVP containing regimen. Two studies did not report the median CD4 cell counts at initiation. Patients who have low CD4 cell counts initiated with EFV containing regimen. The pooled effect size indicated that treatment failure was reduced by 15%, 0.85 (95%CI: 0.75–0.98), and non-nucleoside reverse transcriptase inhibitor (NNRTI) switch was reduced by 43%, 0.57 (95%CI: 0.37–0.89). Conclusion: The risk of treatment failure and NNRTI switch were lower in patients who initiated with EFV than NVP-containing regimen. The review suggests that initiation of patients with EFV-containing regimen will reduce treatment failure and NNRTI switch

    A random effects model for the identification of differential splicing (REIDS) using exon and HTA arrays

    Get PDF
    Background: Alternative gene splicing is a common phenomenon in which a single gene gives rise to multiple transcript isoforms. The process is strictly guided and involves a multitude of proteins and regulatory complexes. Unfortunately, aberrant splicing events do occur which have been linked to genetic disorders, such as several types of cancer and neurodegenerative diseases (Fan et al., Theor Biol Med Model 3:19, 2006). Therefore, understanding the mechanism of alternative splicing and identifying the difference in splicing events between diseased and healthy tissue is crucial in biomedical research with the potential of applications in personalized medicine as well as in drug development. Results: We propose a linear mixed model, Random Effects for the Identification of Differential Splicing (REIDS), for the identification of alternative splicing events. Based on a set of scores, an exon score and an array score, a decision regarding alternative splicing can be made. The model enables the ability to distinguish a differential expressed gene from a differential spliced exon. The proposed model was applied to three case studies concerning both exon and HTA arrays. Conclusion: The REIDS model provides a work flow for the identification of alternative splicing events relying on the established linear mixed model. The model can be applied to different types of arrays

    Using transcriptomics to guide lead optimization in drug discovery projects : lessons learned from the QSTAR project

    Get PDF
    The pharmaceutical industry is faced with steadily declining R&D efficiency which results in fewer drugs reaching the market despite increased investment. A major cause for this low efficiency is the failure of drug candidates in late-stage development owing to safety issues or previously undiscovered side-effects. We analyzed to what extent gene expression data can help to de-risk drug development in early phases by detecting the biological effects of compounds across disease areas, targets and scaffolds. For eight drug discovery projects within a global pharmaceutical company, gene expression data were informative and able to support go/no-go decisions. Our studies show that gene expression profiling can detect adverse effects of compounds, and is a valuable tool in early-stage drug discovery decision making

    A Roadmap for Building Data Science Capacity for Health Discovery and Innovation in Africa

    Get PDF
    Technological advances now make it possible to generate diverse, complex and varying sizes of data in a wide range of applications from business to engineering to medicine. In the health sciences, in particular, data are being produced at an unprecedented rate across the full spectrum of scientific inquiry spanning basic biology, clinical medicine, public health and health care systems. Leveraging these data can accelerate scientific advances, health discovery and innovations. However, data are just the raw material required to generate new knowledge, not knowledge on its own, as a pile of bricks would not be mistaken for a building. In order to solve complex scientific problems, appropriate methods, tools and technologies must be integrated with domain knowledge expertise to generate and analyze big data. This integrated interdisciplinary approach is what has become to be widely known as data science. Although the discipline of data science has been rapidly evolving over the past couple of decades in resource-rich countries, the situation is bleak in resource-limited settings such as most countries in Africa primarily due to lack of well-trained data scientists. In this paper, we highlight a roadmap for building capacity in health data science in Africa to help spur health discovery and innovation, and propose a sustainable potential solution consisting of three key activities: a graduate-level training, faculty development, and stakeholder engagement. We also outline potential challenges and mitigating strategies

    Epidemiology of Mycobacterium tuberculosis lineages and strain clustering within urban and peri-urban settings in Ethiopia.

    Get PDF
    BackgroundPrevious work has shown differential predominance of certain Mycobacterium tuberculosis (M. tb) lineages and sub-lineages among different human populations in diverse geographic regions of Ethiopia. Nevertheless, how strain diversity is evolving under the ongoing rapid socio-economic and environmental changes is poorly understood. The present study investigated factors associated with M. tb lineage predominance and rate of strain clustering within urban and peri-urban settings in Ethiopia.MethodsPulmonary Tuberculosis (PTB) and Cervical tuberculous lymphadenitis (TBLN) patients who visited selected health facilities were recruited in the years of 2016 and 2017. A total of 258 M. tb isolates identified from 163 sputa and 95 fine-needle aspirates (FNA) were characterized by spoligotyping and compared with international M.tb spoligotyping patterns registered at the SITVIT2 databases. The molecular data were linked with clinical and demographic data of the patients for further statistical analysis.ResultsFrom a total of 258 M. tb isolates, 84 distinct spoligotype patterns that included 58 known Shared International Type (SIT) patterns and 26 new or orphan patterns were identified. The majority of strains belonged to two major M. tb lineages, L3 (35.7%) and L4 (61.6%). The observed high percentage of isolates with shared patterns (n = 200/258) suggested a substantial rate of overall clustering (77.5%). After adjusting for the effect of geographical variations, clustering rate was significantly lower among individuals co-infected with HIV and other concomitant chronic disease. Compared to L4, the adjusted odds ratio and 95% confidence interval (AOR; 95% CI) indicated that infections with L3 M. tb strains were more likely to be associated with TBLN [3.47 (1.45, 8.29)] and TB-HIV co-infection [2.84 (1.61, 5.55)].ConclusionDespite the observed difference in strain diversity and geographical distribution of M. tb lineages, compared to earlier studies in Ethiopia, the overall rate of strain clustering suggests higher transmission and warrant more detailed investigations into the molecular epidemiology of TB and related factors

    A closer look on spatiotemporal variations of dissolved oxygen in waste stabilization ponds using mixed models

    Get PDF
    Dissolved oxygen is an essential controlling factor in the performance of facultative and maturation ponds since both take many advantages of algal photosynthetic oxygenation. The rate of this photosynthesis strongly depends on the time during the day and the location in a pond system, whose roles have been overlooked in previous guidelines of pond operation and maintenance (O&M). To elucidate these influences, a linear mixed effect model (LMM) was built on the data collected from three intensive sampling campaigns in a waste stabilization pond in Cuenca, Ecuador. Within two parallel lines of facultative and maturation ponds, nine locations were sampled at two depths in each pond. In general, the output of the mixed model indicated high spatial autocorrelations of data and wide spatiotemporal variations of the oxygen level among and within the ponds. Particularly, different ponds showed different patterns of oxygen dynamics, which were associated with many factors including flow behavior, sludge accumulation, algal distribution, influent fluctuation, and pond function. Moreover, a substantial temporal change in the oxygen level between day and night, from zero to above 20 mg O-2.L-1, was observed. Algal photosynthetic activity appeared to be the main reason for these variations in the model, as it was facilitated by intensive solar radiation at high altitude. Since these diurnal and spatial patterns can supply a large amount of useful information on pond performance, insightful recommendations on dissolved oxygen (DO) monitoring and regulations were delivered. More importantly, as a mixed model showed high predictive performance, i.e., high goodness-of-fit (R-2 of 0.94), low values of mean absolute error, we recommended this advanced statistical technique as an effective tool for dealing with high autocorrelation of data in pond systems

    Short-term real-time prediction of total number of reported COVID-19 cases and deaths in South Africa : a data driven approach

    Get PDF
    BACKGROUND: The rising burden of the ongoing COVID-19 epidemic in South Africa has motivated the application of modeling strategies to predict the COVID-19 cases and deaths. Reliable and accurate short and long-term forecasts of COVID-19 cases and deaths, both at the national and provincial level, are a key aspect of the strategy to handle the COVID-19 epidemic in the country. METHODS: In this paper we apply the previously validated approach of phenomenological models, fitting several nonlinear growth curves (Richards, 3 and 4 parameter logistic, Weibull and Gompertz), to produce short term forecasts of COVID-19 cases and deaths at the national level as well as the provincial level. Using publicly available daily reported cumulative case and death data up until 22 June 2020, we report 5, 10, 15, 20, 25 and 30-day ahead forecasts of cumulative cases and deaths. All predictions are compared to the actual observed values in the forecasting period. RESULTS: We observed that all models for cases provided accurate and similar short-term forecasts for a period of 5 days ahead at the national level, and that the three and four parameter logistic growth models provided more accurate forecasts than that obtained from the Richards model 10 days ahead. However, beyond 10 days all models underestimated the cumulative cases. Our forecasts across the models predict an additional 23,551–26,702 cases in 5 days and an additional 47,449–57,358 cases in 10 days. While the three parameter logistic growth model provided the most accurate forecasts of cumulative deaths within the 10 day period, the Gompertz model was able to better capture the changes in cumulative deaths beyond this period. Our forecasts across the models predict an additional 145–437 COVID-19 deaths in 5 days and an additional 243–947 deaths in 10 days. CONCLUSIONS: By comparing both the predictions of deaths and cases to the observed data in the forecasting period, we found that this modeling approach provides reliable and accurate forecasts for a maximum period of 10 days ahead.http://www.biomedcentral.com/bmcmedresmethodolpm2021Statistic
    • …
    corecore