25 research outputs found

    Augmenting cancer registry data with health survey data with no cases in common : the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer

    Get PDF
    Background: For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time. Methods: Six measures of pre-diagnosis health behaviours (focussing on tobacco smoking, ‘at risk’ alcohol consumption, overweight and exercise) were imputed for 28,000 cancer registry data records of US oesophageal cancers using cold deck imputation from an unrelated health behaviour dataset. Each data point was imputed twice. This calibration allowed us to estimate the misclassification rate. We applied statistical correction for the misclassification to estimate the relative risk of dying within 1 year of diagnosis for each of the imputed behaviour variables. Subgroup analyses were conducted for adenocarcinoma and squamous cell carcinoma separately. Results: Simulated survival data confirmed that accurate estimates of true relative risks could be retrieved for health behaviours with greater than 5% prevalence, although confidence intervals were wide. Applied to real datasets, the estimated relative risks were largely consistent with current knowledge. For example, tobacco smoking status 5 years prior to diagnosis was associated with an increased age-adjusted risk of all cause death within 1 year of diagnosis for oesophageal squamous cell carcinoma (RR = 1.99 95% CI 1.24,3.12) but not oesophageal adenocarcinoma RR = 1.61, 95% CI 0.79,2.57). Conclusions: We have demonstrated a novel imputation-based algorithm for augmenting cancer registry data for epidemiological research which can be used when there are no cases in common between data sets. The algorithm allows investigation of research questions which could not be addressed through direct data linkage

    Androgen deprivation in prostate cancer : benefits of home-based resistance training

    Get PDF
    Introduction: Androgen deprivation therapy (ADT) has detrimental effects on body composition, metabolic health, physical functioning, bone mineral density (BMD) and health-related quality of life (HRQOL) in men with prostate cancer. We investigated whether a 12-month home-based progressive resistance training (PRT) programme, instituted at the start of ADT, could prevent these adverse effects. Methods: Twenty-five patients scheduled to receive at least 12 months of ADT were randomly assigned to either usual care (UC) (n = 12) or PRT (n = 13) starting immediately after their first ADT injection. Body composition, body cell mass (BCM; a functional component of lean body mass), BMD, physical function, insulin sensitivity and HRQOL were measured at 6 weeks and 6 and 12 months. Data were analysed by a linear mixed model. Results: ADT had a negative impact on body composition, BMD, physical function, glucose metabolism and HRQOL. At 12 months, the PRT group had greater reductions in BCM by − 1.9 ± 0.8 % (p = 0.02) and higher gains in fat mass by 3.1 ± 1.0 % (p = 0.002), compared to the UC group. HRQOL domains were maintained or improved in the PRT versus UC group at 6 weeks (general health, p = 0.04), 6 months (vitality, p = 0.02; social functioning, p = 0.03) and 12 months (mental health, p = 0.01; vitality, p = 0.02). A significant increase in the Matsuda Index in the PRT versus UC group was noted at 6 weeks (p = 0.009) but this difference was not maintained at subsequent timepoints. Between-group differences favouring the PRT group were also noted for physical activity levels (step count) (p = 0.02). No differences in measures of BMD or physical function were detected at any time point. Conclusion: A home-based PRT programme instituted at the start of ADT may counteract detrimental changes in body composition, improve physical activity and mental health over 12 months. Trial registration: Australian and New Zealand Clinical Trials Registry, ACTRN1261600131144

    k-link EST clustering : evaluating error introduced by chimeric sequences under different degrees of linkage

    Get PDF
    Motivation: The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence-similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. Results: We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared

    Variable penalty dynamic time warping code for aligning mass spectrometry chromatograms in R

    Get PDF
    Aligment of mass spectrometry (MS) chromatograms is sometimes required prior to sample comparison and data analysis. Without alignment, direct comparison of chromatograms would lead to inaccurate results. We demonstrate a new method for computing a high quality alignment of full length MS chromatograms using variable penalty dynamic time warping. This method aligns signals using local linear shifts without excessive warping that can alter the shape (and area) of chromatogram peaks. The software is available as the R package VPdtw on the Comprehensive R Archive Network and we highlight how one can use this package here

    The effect on accuracy of tweet sample size for hashtag segmentation dictionary construction

    No full text
    Automatic hashtag segmentation is used when analysing twitter data, to associate hashtag terms to those used in common language. The most common form of hashtag segmentation uses a dictionary with a probability distribution over the dictionary terms, constructed from sample texts specific to the given hashtag domain. The language used in Twitter is different to the common language found in published literature, most likely due to the tweet character limit, therefore dictionaries constructed to perform hashtag segmentation should be derived from a random sample of tweets. We ask the question “How large should our sample of tweets be to obtain a given level of segmentation accuracy?”We found that the Jaccard similarity between the correct segmentation and the predicted segmentation using a unigram model, follows a Zero-One inflated Beta distribution with four parameters. We also found that each of these four parameters are functions of the sample size (tweet count) for dictionary construction, implying that we can compute the Jaccard similarity distribution once the tweet count of the dictionary is known. Having this model allows us to compute the number of tweets required for a given level of hashtag segmentation accuracy, and also allows us to compare other segmentation models to this known distribution

    The effect of assessor coverage and assessor accuracy on rank aggregation precision

    No full text
    Rank aggregation is the process of aggregating multiple rankings provided by multiple assessors, of a given set of items, into a single ranking. Each assessor, whether it be human or computer based, is a resource that we use to obtain the multiple rankings. The accuracy of the aggregated ranking depends on the accuracy of the assessor ranking and the assessor coverage of the items. Our question is, given limited assessment resources, should each assessor rank many items to obtain item coverage, spending little time on each item, or should each assessor rank only a few items, but spend more time on each item to obtain a high accuracy ranking? In this article, we take a first step towards answering this question, by developing a model, based on simulation, showing the effect of the number of items assigned to an assessor and the accuracy of the assessment on the precision of the aggregated ranking. We find that when using Binomial allocation of items to assessors, increasing the assessor accuracy provides a greater increase in aggregated rank accuracy

    By the power of Grayskull : small sample statistical power in information retrieval evaluation

    No full text
    Information Retrieval evaluation is typically performed using a sample of queries and a statistical hypothesis test is used to make inferences about the systems accuracy on the population of queries. Research has shown that the t test is one of a set of tests that provides the greatest statistical power while maintaining acceptable type I error rates, when evaluating with a large sample of queries. In this article, we investigate the effect of using a small query sample on the control of the type I error rate and change in type II error rate of a given set of hypothesis tests, meaning that the hypothesis tests may not satisfy Central Limit Theorem conditions. We found that all test performed similarly for unpaired tests. We also found that the bootstrap test provided greater power for the paired test, but violated the desired type I error rate for the smallest sample size (5 queries)

    The use of lifts for emergency evacuation - a reliability study

    Get PDF
    Vertical egress from high-rise buildings is a challenge for occupants with mobility impairments. Past research indicates that use of lifts is a feasible option to supplement conventional stair evacuation. There is however one facet of lift evacuation that has limited available data, that is lift reliability. This research investigates lift reliability through a sample of 81 general passenger lifts in service throughout Australia. The sample period was over the past three years. Lift service records were used to ascertain the downtime due to various faults and maintenance. Statistical analyses of the raw data were conducted to obtain the reliability and its probability distribution. The average reliability of lifts was found to be 0.993. This result was compared with that obtained by others in the past

    Micropylar seed coat restraint and embryonic response to heat shock and smoke control seed dormancy in Grevillea juniperina

    No full text
    Seeds of some eastern Australian Grevillea species show the characteristics of non-deep physiological dormancy, which is broken by exposure to heat shock and/or smoke. The current study tested whether the restrictive effect of the seed coat on germination was localized to specific regions, whether the fire cues affected the growth potential of the embryo, the mechanical strength of the seed coat itself, and the anatomy of fracturing of the seed coat. Removal of the micropylar seed coat allowed germination, while retaining it in place restricted germination. The growth potential of the embryo was increased by exposure to heat shock or to smoke, and increased the most if exposed to both cues. Estimation of the minimum force required by embryos to germinate from intact seeds suggested that this force was reduced for seeds treated with fire cues. The fire cues did not affect the resistance of the seed coat to compressive force when tested after 24 h of imbibition. Fracturing of the seed coat occurred between cell walls, except for the palisade layer, where fracturing occurred across palisade and sclerenchyma cells. While the micropylar end of the seed coat imposes dormancy, most likely by mechanical constraint, heat shock and smoke overcome dormancy by increasing the embryo’s growth potential and possibly weakening the seed coat, either directly or via the embryo

    ChIPseqR : analysis of ChIP-seq experiments

    No full text
    Background: The use of high-throughput sequencing in combination with chromatin immunoprecipitation (ChIP-seq) has enabled the study of genome-wide protein binding at high resolution. While the amount of data generated from such experiments is steadily increasing, the methods available for their analysis remain limited. Although several algorithms for the analysis of ChIP-seq data have been published they focus almost exclusively on transcription factor studies and are usually not well suited for the analysis of other types of experiments. Results: Here we present ChIPseqR, an algorithm for the analysis of nucleosome positioning and histone modification ChIP-seq experiments. The performance of this novel method is studied on short read sequencing data of Arabidopsis thaliana mononucleosomes as well as on simulated data. Conclusions: ChIPseqR is shown to improve sensitivity and spatial resolution over existing methods while maintaining high specificity. Further analysis of predicted nucleosomes reveals characteristic patterns in nucleosome sequences and placement
    corecore