178 research outputs found

    Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes.</p> <p>Results</p> <p>We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes.</p> <p>Conclusion</p> <p>We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.</p

    Age-dependent prevalence of 14 high-risk HPV types in the Netherlands: implications for prophylactic vaccination and screening

    Get PDF
    We determined the prevalence of type-specific hrHPV infections in the Netherlands on cervical scrapes of 45 362 women aged 18–65 years. The overall hrHPV prevalence peaked at the age of 22 with peak prevalence of 24%. Each of the 14 hrHPV types decreased significantly with age (P-values between 0.0009 and 0.03). The proportion of HPV16 in hrHPV-positive infections also decreased with age (OR=0.76 (10-year scale), 95% CI=0.67–0.85), and a similar trend was observed for HPV16 when selecting hrHPV-positive women with cervical intraepithelial neoplasia grade 2 or worse (CIN2+) (OR=0.76, 95% CI=0.56–1.01). In women eligible for routine screening (age 29–61 years) with confirmed CIN2+, 65% was infected with HPV16 and/or HPV18. When HPV16/18-positive infections in women eligible for routine screening were discarded, the positive predictive value of cytology for the detection of CIN2+ decreased from 27 to 15%, the positive predictive value of hrHPV testing decreased from 26 to 15%, and the predictive value of a double-positive test (positive HPV test and a positive cytology) decreased from 54 to 41%. In women vaccinated against HPV16/18, screening remains important to detect cervical lesions caused by non-HPV16/18 types. To maintain a high-positive predictive value, screening algorithms must be carefully re-evaluated with regard to the screening modalities and length of the screening interval

    Feature Extraction and Random Forest to Identify Sheep Behavior from Accelerometer Data

    Get PDF
    Sensor technologies play an essential part in the agricultural community and many other scientific and commercial communities. Accelerometer signals and Machine Learning techniques can be used to identify and observe behaviours of animals without the need for an exhaustive human observation which is labour intensive and time consuming. This study employed random forest algorithm to identify grazing, walking, scratching, and inactivity (standing, resting) of 8 Hebridean ewes located in Cheshire, Shotwick in the UK. We gathered accelerometer data from a sensor device which was fitted on the collar of the animals. The selection of the algorithm was based on previous research by which random forest achieved the best results among other benchmark techniques. Therefore, in this study, more focus was given to feature engineering to improve prediction performance. Seventeen features from time and frequency domain were calculated from the accelerometer measurements and the magnitude of the acceleration. Feature elimination was utilised in which highly correlated ones were removed, and only nine out of seventeen features were selected. The algorithm achieved an overall accuracy of 99.43% and a kappa value of 98.66%. The accuracy for grazing, walking, scratching, and inactive was 99.08%, 99.13%, 99.90%, and 99.85%, respectively. The overall results showed that there is a significant improvement over previous methods and studies for all mutually exclusive behaviours. Those results are promising, and the technique could be further tested for future real-time activity recognition

    Improved ability of biological and previous caries multimarkers to predict caries disease as revealed by multivariate PLS modelling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dental caries is a chronic disease with plaque bacteria, diet and saliva modifying disease activity. Here we have used the PLS method to evaluate a multiplicity of such biological variables (n = 88) for ability to predict caries in a cross-sectional (baseline caries) and prospective (2-year caries development) setting.</p> <p>Methods</p> <p>Multivariate PLS modelling was used to associate the many biological variables with caries recorded in thirty 14-year-old children by measuring the numbers of incipient and manifest caries lesions at all surfaces.</p> <p>Results</p> <p>A wide but shallow gliding scale of one fifth caries promoting or protecting, and four fifths non-influential, variables occurred. The influential markers behaved in the order of plaque bacteria > diet > saliva, with previously known plaque bacteria/diet markers and a set of new protective diet markers. A differential variable patterning appeared for new versus progressing lesions. The influential biological multimarkers (n = 18) predicted baseline caries better (ROC area 0.96) than five markers (0.92) and a single lactobacilli marker (0.7) with sensitivity/specificity of 1.87, 1.78 and 1.13 at 1/3 of the subjects diagnosed sick, respectively. Moreover, biological multimarkers (n = 18) explained 2-year caries increment slightly better than reported before but predicted it poorly (ROC area 0.76). By contrast, multimarkers based on previous caries predicted alone (ROC area 0.88), or together with biological multimarkers (0.94), increment well with a sensitivity/specificity of 1.74 at 1/3 of the subjects diagnosed sick.</p> <p>Conclusion</p> <p>Multimarkers behave better than single-to-five markers but future multimarker strategies will require systematic searches for improved saliva and plaque bacteria markers.</p

    Systematic Evaluation of Factors Influencing ChIP-Seq Fidelity

    Get PDF
    We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage

    Sediment properties as important predictors of carbon storage in zostera marina meadows: a comparison of four European areas

    Get PDF
    Seagrass ecosystems are important natural carbon sinks but their efficiency varies greatly depending on species composition and environmental conditions. What causes this variation is not fully known and could have important implications for management and protection of the seagrass habitat to continue to act as a natural carbon sink. Here, we assessed sedimentary organic carbon in Zostera marina meadows (and adjacent unvegetated sediment) in four distinct areas of Europe (Gullmar Fjord on the Swedish Skagerrak coast, Asko in the Baltic Sea, Sozopol in the Black Sea and Ria Formosa in southern Portugal) down to similar to 35 cm depth. We also tested how sedimentary organic carbon in Z. marina meadows relates to different sediment characteristics, a range of seagrass-associated variables and water depth. The seagrass carbon storage varied greatly among areas, with an average organic carbon content ranging from 2.79 +/- 0.50% in the Gullmar Fjord to 0.17 +/- 0.02% in the area of Sozopol. We found that a high proportion of fine grain size, high porosity and low density of the sediment is strongly related to high carbon content in Z. marina sediment. We suggest that sediment properties should be included as an important factor when evaluating high priority areas in management of Z. marina generated carbon sinks

    Optimizing the procedure of grain nutrient predictions in barley via hyperspectral imaging

    Get PDF
    Hyperspectral imaging enables researchers and plant breeders to analyze various traits of interest like nutritional value in high throughput. In order to achieve this, the optimal design of a reliable calibration model, linking the measured spectra with the investigated traits, is necessary. In the present study we investigated the impact of different regression models, calibration set sizes and calibration set compositions on prediction performance. For this purpose, we analyzed concentrations of six globally relevant grain nutrients of the wild barley population HEB-YIELD as case study. The data comprised 1,593 plots, grown in 2015 and 2016 at the locations Dundee and Halle, which have been entirely analyzed through traditional laboratory methods and hyperspectral imaging. The results indicated that a linear regression model based on partial least squares outperformed neural networks in this particular data modelling task. There existed a positive relationship between the number of samples in a calibration model and prediction performance, with a local optimum at a calibration set size of ~40% of the total data. The inclusion of samples from several years and locations could clearly improve the predictions of the investigated nutrient traits at small calibration set sizes. It should be stated that the expansion of calibration models with additional samples is only useful as long as they are able to increase trait variability. Models obtained in a certain environment were only to a limited extent transferable to other environments. They should therefore be successively upgraded with new calibration data to enable a reliable prediction of the desired traits. The presented results will assist the design and conceptualization of future hyperspectral imaging projects in order to achieve reliable predictions. It will in general help to establish practical applications of hyperspectral imaging systems, for instance in plant breeding concepts
    • …
    corecore