38 research outputs found

    Human papilloma virus vaccination programs reduce health inequity in most scenarios: a simulation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The global and within-country epidemiology of cervical cancer exemplifies health inequity. Public health programs may reduce absolute risk but increase inequity; inequity may be further compounded by screening programs. In this context, we aimed to explore what the impact of human papillomavirus (HPV) vaccine might have on health equity allowing for uncertainty surrounding the long-term effect of HPV vaccination programs.</p> <p>Methods</p> <p>A simple static multi-way sensitivity analysis was carried out to compare the relative risk, comparing after to before implementation of a vaccination program, of infections which would cause invasive cervical cancer if neither prevented nor detected, using plausible ranges of vaccine effectiveness, vaccination coverage, screening sensitivity, screening uptake and changes in uptake.</p> <p>Results</p> <p>We considered a total number of 3,793,902 scenarios. In 63.9% of scenarios considered, vaccination would lead to a better outcome for a population or subgroup with that combination of parameters. Regardless of vaccine effectiveness and coverage, most simulations led to lower rates of disease.</p> <p>Conclusions</p> <p>If vaccination coverage and screening uptake are high, then communities are always better off with a vaccination program. The findings highlight the importance of achieving and maintaining high immunization coverage and screening uptake in high risk groups in the interest of health equity.</p

    Genome-wide association analysis of cardiovascular-related quantitative traits in the Framingham Heart Study

    Get PDF
    Multivariate linear growth curves were used to model high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG), and systolic blood pressure (SBP) measured during four exams from 1659 independent individuals from the Framingham Heart Study. The slopes and intercepts from each of two phenotype models were tested for association with 348,053 autosomal single-nucleotide polymorphisms from the Affymetrix Gene Chip 500 k set. Three regions were associated with LDL intercept, TG slope, and SBP intercept (p < 1.44 × 10-7). We observed results consistent with previously reported associations between rs599839, on chromosome 1p13, and LDL. We note that the association is significant with LDL intercept but not slope. Markers on chromosome 17q25 were associated with TG slope, and a single-nucleotide polymorphism on chromosome 7p11 was associated with SBP intercept. Growth curve models can be used to gain more insight on the relationships between SNPs and traits than traditional association analysis when longitudinal data has been collected. The power to detect association with changes over time may be limited if the subjects are not followed over a long enough time period

    Determining relative importance of variables in developing and validating predictive models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple regression models are used in a wide range of scientific disciplines and automated model selection procedures are frequently used to identify independent predictors. However, determination of relative importance of potential predictors and validating the fitted models for their stability, predictive accuracy and generalizability are often overlooked or not done thoroughly.</p> <p>Methods</p> <p>Using a case study aimed at predicting children with acute lymphoblastic leukemia (ALL) who are at low risk of Tumor Lysis Syndrome (TLS), we propose and compare two strategies, bootstrapping and random split of data, for ordering potential predictors according to their relative importance with respect to model stability and generalizability. We also propose an approach based on relative increase in percentage of explained variation and area under the Receiver Operating Characteristic (ROC) curve for developing models where variables from our ordered list enter the model according to their importance. An additional data set aimed at identifying predictors of prostate cancer penetration is also used for illustrative purposes.</p> <p>Results</p> <p>Age is chosen to be the most important predictor of TLS. It is selected 100% of the time using the bootstrapping approach. Using the random split method, it is selected 99% of the time in the training data and is significant (at 5% level) 98% of the time in the validation data set. This indicates that age is a stable predictor of TLS with good generalizability. The second most important variable is white blood cell count (WBC). Our methods also identified an important predictor of TLS that was otherwise omitted if relying on any of the automated model selection procedures alone. A group at low risk of TLS consists of children younger than 10 years of age, without T-cell immunophenotype, whose baseline WBC is < 20 × 10<sup>9</sup>/L and palpable spleen is < 2 cm. For the prostate cancer data set, the Gleason score and digital rectal exam are identified to be the most important indicators of whether tumor has penetrated the prostate capsule.</p> <p>Conclusion</p> <p>Our model selection procedures based on bootstrap re-sampling and repeated random split techniques can be used to assess the strength of evidence that a variable is truly an independent and reproducible predictor. Our methods, therefore, can be used for developing stable and reproducible models with good performances. Moreover, our methods can serve as a good tool for validating a predictive model. Previous biological and clinical studies support the findings based on our selection and validation strategies. However, extensive simulations may be required to assess the performance of our methods under different scenarios as well as check their sensitivity to a random fluctuation in the data.</p

    Pathway-based analysis of a genome-wide case-control association study of rheumatoid arthritis

    Get PDF
    Evaluation of the association between single-nucleotide polymorphisms (SNPs) and disease outcomes is widely used to identify genetic risk factors for complex diseases. Although this analysis paradigm has made significant progress in many genetic studies, many challenges remain, such as the requirement of a large sample size to achieve adequate power. Here we use rheumatoid arthritis (RA) as an example and explore a new analysis strategy: pathway-based analysis to search for related genes and SNPs contributing to the disease

    Data Integration in Genetics and Genomics: Methods and Challenges

    Get PDF
    Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects

    Potential risk factors associated with human encephalitis: application of canonical correlation analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Infection of the CNS is considered to be the major cause of encephalitis and more than 100 different pathogens have been recognized as causative agents. Despite being identified worldwide as an important public health concern, studies on encephalitis are very few and often focus on particular types (with respect to causative agents) of encephalitis (e.g. West Nile, Japanese, etc.). Moreover, a number of other infectious and non-infectious conditions present with similar symptoms, and distinguishing encephalitis from other disguising conditions continues to a challenging task.</p> <p>Methods</p> <p>We used canonical correlation analysis (CCA) to assess associations between set of exposure variable and set of symptom and diagnostic variables in human encephalitis. Data consists of 208 confirmed cases of encephalitis from a prospective multicenter study conducted in the United Kingdom. We used a covariance matrix based on Gini's measure of similarity and used permutation based approaches to test significance of canonical variates.</p> <p>Results</p> <p>Results show that weak pair-wise correlation exists between the risk factor (exposure and demographic) and symptom/laboratory variables. However, the first canonical variate from CCA revealed strong multivariate correlation (ρ = 0.71, se = 0.03, p = 0.013) between the two sets. We found a moderate correlation (ρ = 0.54, se = 0.02) between the variables in the second canonical variate, however, the value is not statistically significant (p = 0.68). Our results also show that a very small amount of the variation in the symptom sets is explained by the exposure variables. This indicates that host factors, rather than environmental factors might be important towards understanding the etiology of encephalitis and facilitate early diagnosis and treatment of encephalitis patients.</p> <p>Conclusions</p> <p>There is no standard laboratory diagnostic strategy for investigation of encephalitis and even experienced physicians are often uncertain about the cause, appropriate therapy and prognosis of encephalitis. Exploration of human encephalitis data using advanced multivariate statistical modelling approaches that can capture the inherent complexity in the data is, therefore, crucial in understanding the causes of human encephalitis. Moreover, application of multivariate exploratory techniques will generate clinically important hypotheses and offer useful insight into the number and nature of variables worthy of further consideration in a confirmatory statistical analysis.</p

    Application of biomedical informatics to chronic pediatric diseases: a systematic review

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chronic diseases affect millions of children worldwide leading to substantial disease burden to the children and their families as well as escalating health care costs. The increasing trend in the prevalence of complex pediatric chronic diseases requires innovative and optimal delivery of care. Biomedical informatics applications play an important role in improving health outcomes while being cost-effective. However, their utility in pediatric chronic diseases has not been studied in a comprehensive and systematic way. The objective of this study was to conduct a systematic review of the effects of biomedical informatics applications in pediatric chronic diseases.</p> <p>Methods</p> <p>A comprehensive literature search was conducted using MEDLINE, the Cochrane Library and EMBASE databases from inception of each database to September 2008. We included studies of any methodological type and any language that applied biomedical informatics to chronic conditions in children and adolescents 18 years of age or younger. Two independent reviewers carried out study selection and data extraction. Quality assessment was performed using a study design evaluation instrument to appraise the strength of the studies and their methodological adequacy. Because of heterogeneity in the conditions and outcomes we studied, a formal meta-analysis was not performed.</p> <p>Results</p> <p>Based on our search strategy, 655 titles and abstracts were reviewed. From this set we identified 27 relevant articles that met our inclusion criteria. The results from these studies indicated that biomedical informatics applications have favourable clinical and patient outcomes including, but not limited to, reduced number of emergency room visits, improved knowledge on disease management, and enhanced satisfaction. Seventy percent of reviewed papers were published after year 2000, 89% of users were patients and 11% were either providers or caregivers. The majority (96%) of the selected studies reported improved outcomes.</p> <p>Conclusion</p> <p>Published studies suggested positive impacts of informatics predominantly in pediatric asthma. As electronic tools become more widely adopted, there will be opportunities to improve patient care in a wide range of chronic illnesses through informatics solutions.</p

    A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data

    No full text
    Gene ranking problem in time course microarray experiments is challenging since gene expression levels between different time points are correlated. This is because, expression values at successive time points are usually taken from the same organism, tissue or culture. Moreover, time dependency of gene expression values is usually of interest and often is the biological problem that motivates the experiment. We propose a multivariate growth curve model for ranking genes and estimating mean gene expression profiles in replicated time course microarray data. The approach takes the within individual correlation as well as the temporal ordering into consideration. Moreover, time is incorporated as a continuous variable in the model to account for the temporal pattern. Polynomial profiles are assumed to describe the time dependence and a transformation incorporating information across the genes is used. A moderated likelihood ratio test is then applied to the transformed data to get a statistic for ranking genes according to the difference in expression profiles among biological groups. The methodology is presented in a general setup and could be used for one sample as well as more than one sample problem. The estimation is done in a multivariate framework in which information from all the groups involved is used for better inference. Moreover, the within individual correlation as well as information across genes entered in the estimation through a moderated covariance matrix. We assess the performance of our method using simulation studies and illustrate the results with publicly available real time course microarray data.
    corecore