167 research outputs found

    Efficient algorithms for decision tree cross-validation

    Full text link
    Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.Comment: 9 pages, 6 figures. http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478

    Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limited demographic and clinical data for each subject. Previous studies using statistical classification or machine learning algorithms have focused on gene expression data only. The present paper investigates if such techniques can benefit from including demographic and clinical data.</p> <p>Results</p> <p>We compare six classification algorithms: support vector machines (SVMs), nearest shrunken centroids, decision trees, ensemble of voters, naïve Bayes, and nearest neighbor. SVMs outperform the other algorithms. Using expression data only, they yield an area under the ROC curve of 0.92 for bipolar disorder versus control, and 0.91 for schizophrenia versus control. By including demographic and clinical data, classification performance improves to 0.97 and 0.94 respectively.</p> <p>Conclusion</p> <p>This paper demonstrates that SVMs can distinguish bipolar disorder and schizophrenia from normal control at a very high rate. Moreover, it shows that classification performance improves by including demographic and clinical data. We also found that some variables in this data set, such as alcohol and drug use, are strongly associated to the diseases. These variables may affect gene expression and make it more difficult to identify genes that are directly associated to the diseases. Stratification can correct for such variables, but we show that this reduces the power of the statistical methods.</p

    Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

    Get PDF
    This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast

    Coexpression and interaction of CXCL10 and CD26 in mesenchymal cells by synergising inflammatory cytokines: CXCL8 and CXCL10 are discriminative markers for autoimmune arthropathies

    Get PDF
    Leukocyte infiltration during acute and chronic inflammation is regulated by exogenous and endogenous factors, including cytokines, chemokines and proteases. Stimulation of fibroblasts and human microvascular endothelial cells with the inflammatory cytokines interleukin-1β (IL-1β) or tumour necrosis factor alpha (TNF-α) combined with either interferon-α (IFN-α), IFN-β or IFN-γ resulted in a synergistic induction of the CXC chemokine CXCL10, but not of the neutrophil chemoattractant CXCL8. In contrast, simultaneous stimulation with different IFN types did not result in a synergistic CXCL10 protein induction. Purification of natural CXCL10 from the conditioned medium of fibroblasts led to the isolation of CD26/dipeptidyl peptidase IV-processed CXCL10 missing two NH(2)-terminal residues. In contrast to intact CXCL10, NH(2)-terminally truncated CXCL10(3–77) did not induce extracellular signal-regulated kinase 1/2 or Akt/protein kinase B phosphorylation in CXC chemokine receptor 3-transfected cells. Together with the expression of CXCL10, the expression of membrane-bound CD26/dipeptidyl peptidase IV was also upregulated in fibroblasts by IFN-γ, by IFN-γ plus IL-1β or by IFN-γ plus TNF-α. This provides a negative feedback for CXCL10-dependent chemotaxis of activated T cells and natural killer cells. Since TNF-α and IL-1β are implicated in arthritis, synovial concentrations of CXCL8 and CXCL10 were compared in patients suffering from crystal arthritis, ankylosing spondylitis, psoriatic arthritis and rheumatoid arthritis. All three groups of autoimmune arthritis patients (ankylosing spondylitis, psoriatic arthritis and rheumatoid arthritis) had significantly increased synovial CXCL10 levels compared with crystal arthritis patients. In contrast, compared with crystal arthritis, only rheumatoid arthritis patients, and not ankylosing spondylitis or psoriatic arthritis patients, had significantly higher synovial CXCL8 concentrations. Synovial concentrations of the neutrophil chemoattractant CXCL8 may therefore be useful to discriminate between autoimmune arthritis types

    PF-4var/CXCL4L1 Predicts Outcome in Stable Coronary Artery Disease Patients with Preserved Left Ventricular Function

    Get PDF
    Background: Platelet-derived chemokines are implicated in several aspects of vascular biology. However, for the chemokine platelet factor 4 variant (PF-4var/CXCL4L1), released by platelets during thrombosis and with different properties as compared to PF-4/CXCL4, its role in heart disease is not yet studied. We evaluated the determinants and prognostic value of the platelet-derived chemokines PF-4var, PF-4 and RANTES/CCL5 in patients with stable coronary artery disease (CAD). Methodology/Principal Findings: From 205 consecutive patients with stable CAD and preserved left ventricular (LV) function, blood samples were taken at inclusion and were analyzed for PF-4var, RANTES, platelet factor-4 and N-terminal pro-B-type natriuretic peptide (NT-proBNP). Patients were followed (median follow-up 2.5 years) for the combined endpoint of cardiac death, non-fatal acute myocardial infarction, stroke or hospitalization for heart failure. Independent determinants of PF-4var levels (median 10 ng/ml; interquartile range 8-16 ng/ml) were age, gender and circulating platelet number. Patients who experienced cardiac events (n = 20) during follow-up showed lower levels of PF-4var (8.5 [5.3-10] ng/ml versus 12 [8-16] ng/ml, p = 0.033). ROC analysis for events showed an area under the curve (AUC) of 0.82 (95% CI 0.73-0.90, p<0.001) for higher NT-proBNP levels and an AUC of 0.32 (95% CI 0.19-0.45, p = 0.009) for lower PF-4var levels. Cox proportional hazard analysis showed that PF-4var has an independent prognostic value on top of NT-proBNP. Conclusions: We conclude that low PF-4var/CXCL4L1 levels are associated with a poor outcome in patients with stable CAD and preserved LV function. This prognostic value is independent of NT-proBNP levels, suggesting that both neurohormonal and platelet-related factors determine outcome in these patients

    Glucosylsphingosine Is a Highly Sensitive and Specific Biomarker for Primary Diagnostic and Follow-Up Monitoring in Gaucher Disease in a Non-Jewish, Caucasian Cohort of Gaucher Disease Patients

    Get PDF
    Gaucher disease (GD) is the most common lysosomal storage disorder (LSD). Based on a deficient β-glucocerebrosidase it leads to an accumulation of glucosylceramide. Standard diagnostic procedures include measurement of enzyme activity, genetic testing as well as analysis of chitotriosidase and CCL18/PARC as biomarkers. Even though chitotriosidase is the most well-established biomarker in GD, it is not specific for GD. Furthermore, it may be false negative in a significant percentage of GD patients due to mutation. Additionally, chitotriosidase reflects the changes in the course of the disease belatedly. This further enhances the need for a reliable biomarker, especially for the monitoring of the disease and the impact of potential treatments.Here, we evaluated the sensitivity and specificity of the previously reported biomarker Glucosylsphingosine with regard to different control groups (healthy control vs. GD carriers vs. other LSDs).Only GD patients displayed elevated levels of Glucosylsphingosine higher than 12 ng/ml whereas the comparison controls groups revealed concentrations below the pathological cut-off, verifying the specificity of Glucosylsphingosine as a biomarker for GD. In addition, we evaluated the biomarker before and during enzyme replacement therapy (ERT) in 19 patients, demonstrating a decrease in Glucosylsphingosine over time with the most pronounced reduction within the first 6 months of ERT. Furthermore, our data reveals a correlation between the medical consequence of specific mutations and Glucosylsphingosine.In summary, Glucosylsphingosine is a very promising, reliable and specific biomarker for GD

    Opal-CT precipitation in a clayey soil explained by geochemical transport model of dissolved Si (Blégny, Belgium)

    Full text link
    Opal-CT precipitation controlling dissolved Si export Dissolved Si (DSi) exported by rivers are controlled by geological, hydrological and biological cycle processes [1]. The DSi concentrations measured in a river of an upstream catchment in eastern Belgium (Blégny, Land of Herve) don’t vary seasonally (6.91±0.94mgL-1; n=363). Si concentrations in pore water are often higher and vary more (8.65±3.65mgL-1; n=128). The decrease of DSi along the flowpath of water is due to sink processes, i.e. precipitation, adsorption or uptake by vegetation. As the DSi in the river does not show any seasonal variation, uptake by vegetation can be ruled out [1] whereas precipitation or adsorption can control the DSi drained by the stream water. This hypothesis is confirmed by XRD and DeMaster analysis. At 0.1m depth the soil is constituted of 62% quartz, 7% K-feldspar, 6% plagioclase, 3.2% carbonates, 18.9% Al-clay, 1.47% Kaolinite, 0.63% Chlorite and 0.2% amorphous Si, probably of biogenic origin. At 1.5m depth, the amounts of several minerals (35.8% quartz, 0.6% K-feldspars, 0.9% plagioclase, Al-clay 14.7%) drop drastically. Carbonates, chlorite and kaolinite are absent whereas 40.4% opal-CT appears. The precipitation of opal-CT controls the DSi export of this catchment. Development of geochemical transport model To descripe DSi export from a catchment a geochemical transport model is developped in HP1 which couples the water flux model Hydrus with the geochemical model PHREEQC [2]. Our model is based on the conceptual model developped in [3]. First results show different DSi export dynamics in the unsaturated zone than in the aquifer due to different pCO2 values and varying soil moisture conditions. Further development of the model will help to find out the reason of opal-CT precipitation in this setting. [1]Fulweiler, Nixon (2005) Biogeochemistry 74:115–130. [2] Simunek, Jacques, van Genuchten, Mallants (2006) JAWRA 42:1537-1547. [3] Ronchi et al. (2013). Silicon, 5(1), 115–133

    Post Hoc Analysis Of The Patricia Randomized Trial Of The Efficacy Of Human Papillomavirus Type 16 (hpv-16)/hpv-18 As04-adjuvanted Vaccine Against Incident And Persistent Infection With Nonvaccine Oncogenic Hpv Types Using An Alternative Multiplex Type-specific Pcr Assay For Hpv Dna

    Get PDF
    The efficacy of the human papillomavirus type 16 (HPV-16)/HPV-18 AS04-adjuvanted vaccine against cervical infections with HPV in the Papilloma Trial against Cancer in Young Adults (PATRICIA) was evaluated using a combination of the broad-spectrum L1-based SPF10 PCR-DNA enzyme immunoassay (DEIA)/line probe assay (LiPA(25)) system with type-specific PCRs for HPV-16 and -18. Broad-spectrum PCR assays may underestimate the presence of HPV genotypes present at relatively low concentrations in multiple infections, due to competition between genotypes. Therefore, samples were retrospectively reanalyzed using a testing algorithm incorporating the SPF10 PCR-DEIA/LiPA(25) plus a novel E6-based multiplex type-specific PCR and reverse hybridization assay (MPTS12 RHA), which permits detection of a panel of nine oncogenic HPV genotypes (types 16, 18, 31, 33, 35, 45, 52, 58, and 59). For the vaccine against HPV types 16 and 18, there was no major impact on estimates of vaccine efficacy (VE) for incident or 6-month or 12-month persistent infections when the MPTS12 RHA was included in the testing algorithm versus estimates with the protocol-specified algorithm. However, the alternative testing algorithm showed greater sensitivity than the protocol-specified algorithm for detection of some nonvaccine oncogenic HPV types. More cases were gained in the control group than in the vaccine group, leading to higher point estimates of VE for 6-month and 12-month persistent infections for the nonvaccine oncogenic types included in the MPTS12 RHA assay (types 31, 33, 35, 45, 52, 58, and 59). This post hoc analysis indicates that the per-protocol testing algorithm used in PATRICIA underestimated the VE against some nonvaccine oncogenic HPV types and that the choice of the HPV DNA testing methodology is important for the evaluation of VE in clinical trials.222235244GlaxoSmithKline Biologicals SAGSKSanofi Pasteur MSDMerck Co.QiagenCSLSpeakers Burea

    Predicting gene function using hierarchical multi-label decision tree ensembles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.</p> <p>Results</p> <p>We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.</p> <p>Conclusions</p> <p>Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p
    corecore